28 January 2026

How Automated Prompt Optimization Unlocks Quality Gains for ML Kit’s GenAI Prompt API

Posted by Chetan Tekur, PM at AI Innovation and Research, Chao Zhao, SWE at AI Innovation and Research, Paul Zhou, Prompt Quality Lead at GCP Cloud AI and Industry Solutions, and Caren Chang, Developer Relations Engineer at Android

Automated Prompt Optimization (APO)

To further help bring your ML Kit Prompt API use cases to production, we are excited to announce Automated Prompt Optimization (APO) targeting On-Device models on Vertex AI. Automated Prompt Optimization is a tool that helps you automatically find the optimal prompt for your use cases.

The era of On-Device AI is no longer a promise—it is a production reality. With the release of Gemini Nano v3, we are placing unprecedented language understanding and multimodal capabilities directly into the palms of users. Through the Gemini Nano family of models, we have wide coverage of supported devices across the Android Ecosystem. But for developers building the next generation of intelligent apps, access to a powerful model is only step one. The real challenge lies in customization: How do you tailor a foundation model to expert-level performance for your specific use case without breaking the constraints of mobile hardware?

In the server-side world, the larger LLMs tend to be highly capable and require less domain adaptation. Even when needed, more advanced options such as LoRA (Low-Rank Adaptation) fine-tuning can be feasible options. However, the unique architecture of Android AICore prioritizes a shared, memory-efficient system model. This means that deploying custom LoRA adapters for every individual app comes with challenges on these shared system services.

But there is an alternate path that can be equally impactful. By leveraging Automated Prompt Optimization (APO) on Vertex AI, developers can achieve quality approaching fine-tuning, all while working seamlessly within the native Android execution environment. By focusing on superior system instruction, APO enables developers to tailor model behavior with greater robustness and scalability than traditional fine-tuning solutions.

Note: Gemini Nano V3 is a quality optimized version of the highly acclaimed Gemma 3N model. Any prompt optimizations that are made on the open source Gemma 3N model will apply to Gemini Nano V3 as well. On supported devices, ML Kit GenAI APIs leverage the nano-v3 model to maximize the quality for Android Developers

APO treats the prompt not as a static text, but as a programmable surface that can be optimized. It leverages server-side models (like Gemini Pro and Flash) to propose prompts, evaluate variations and find the optimal one for your specific task. This process employs three specific technical mechanisms to maximize performance:

Automated Error Analysis: APO analyzes error patterns from training data to Automatically identify specific weaknesses in the initial prompt.
Semantic Instruction Distillation: It analyzes massive training examples to distill the "true intent" of a task, creating instructions that more accurately reflect the real data distribution.
Parallel Candidate Testing: Instead of testing one idea at a time, APO generates and tests numerous prompt candidates in parallel to identify the global maximum for quality.

Why APO Can Approach Fine Tuning Quality

It is a common misconception that fine-tuning always yields better quality than prompting. For modern foundation models like Gemini Nano v3, prompt engineering can be impactful by itself:

Preserving General capabilities: Fine-tuning ( PEFT/LoRA) forces a model's weights to over-index on a specific distribution of data. This often leads to "catastrophic forgetting," where the model gets better at your specific syntax but worse at general logic and safety. APO leaves the weights untouched, preserving the capabilities of the base model.
Instruction Following & Strategy Discovery: Gemini Nano v3 has been rigorously trained to follow complex system instructions. APO exploits this by finding the exact instruction structure that unlocks the model's latent capabilities, often discovering strategies that might be hard for human engineers to find.

To validate this approach, we evaluated APO across diverse production workloads. Our validation has shown consistent 5-8% accuracy gains across various use cases.Across multiple deployed on-device features, APO provided significant quality lifts.

Use Case
Task Type
Task Description
Metric
APO Improvement
Topic classification
Text classification
Classify a news article into topics such as finance, sports, etc
Accuracy
+5%
Intent classification
Text classification
Classify a customer service query into intents
Accuracy
+8.0%
Webpage translation
Text translation
Translate a webpage from English to a local language
BLEU
+8.57%

A Seamless, End-to-End Developer Workflow

It is a common misconception that fine-tuning always yields better quality than prompting. For modern foundation models like Gemini Nano v3, prompt engineering can be impactful by itself:

Preserving General capabilities: Fine-tuning ( PEFT/LoRA) forces a model's weights to over-index on a specific distribution of data. This often leads to "catastrophic forgetting," where the model gets better at your specific syntax but worse at general logic and safety. APO leaves the weights untouched, preserving the capabilities of the base model.
Instruction Following & Strategy Discovery: Gemini Nano v3 has been rigorously trained to follow complex system instructions. APO exploits this by finding the exact instruction structure that unlocks the model's latent capabilities, often discovering strategies that might be hard for human engineers to find.

Conclusion

The release of Automated Prompt Optimization (APO) marks a turning point for on-device generative AI. By bridging the gap between foundation models and expert-level performance, we are giving developers the tools to build more robust mobile applications. Whether you are just starting with Zero-Shot Optimization or scaling to production with Data-Driven refinement, the path to high-quality on-device intelligence is now clearer. Launch your on-device use cases to production today with ML Kit’s Prompt API and Vertex AI’s Automated Prompt Optimization.

Relevant links:

Android AICore Android Prompting Data-Driven Prompts Gemma LLM Optimization LLM Tuning ML Kit System Instructions Vertex AI Prompts Zero-Shot Prompts

How Automated Prompt Optimization Unlocks Quality Gains for ML Kit’s GenAI Prompt API

Why APO Can Approach Fine Tuning Quality

A Seamless, End-to-End Developer Workflow

Conclusion

Google developers blog

Connect

Subscribe

Use Case	Task Type	Task Description	Metric	APO Improvement
Topic classification	Text classification	Classify a news article into topics such as finance, sports, etc	Accuracy	+5%
Intent classification	Text classification	Classify a customer service query into intents	Accuracy	+8.0%
Webpage translation	Text translation	Translate a webpage from English to a local language	BLEU	+8.57%

How Automated Prompt Optimization Unlocks Quality Gains for ML Kit’s GenAI Prompt API

Why APO Can Approach Fine Tuning Quality

A Seamless, End-to-End Developer Workflow

Conclusion

Google developers blog

Connect

Subscribe

Feed

Newsletter