28 January 2026
Posted by Chetan Tekur, PM at AI Innovation and Research, Chao Zhao, SWE at AI Innovation and Research, Paul Zhou, Prompt Quality Lead at GCP Cloud AI and Industry Solutions, and Caren Chang, Developer Relations Engineer at Android
To further help bring your ML Kit Prompt API use cases to production, we are excited to announce Automated Prompt Optimization (APO) targeting On-Device models on Vertex AI. Automated Prompt Optimization is a tool that helps you automatically find the optimal prompt for your use cases.
The era of On-Device AI is no longer a promise—it is a production reality. With the release of Gemini Nano v3, we are placing unprecedented language understanding and multimodal capabilities directly into the palms of users. Through the Gemini Nano family of models, we have wide coverage of supported devices across the Android Ecosystem. But for developers building the next generation of intelligent apps, access to a powerful model is only step one. The real challenge lies in customization: How do you tailor a foundation model to expert-level performance for your specific use case without breaking the constraints of mobile hardware?
In the server-side world, the larger LLMs tend to be highly capable and require less domain adaptation. Even when needed, more advanced options such as LoRA (Low-Rank Adaptation) fine-tuning can be feasible options. However, the unique architecture of Android AICore prioritizes a shared, memory-efficient system model. This means that deploying custom LoRA adapters for every individual app comes with challenges on these shared system services.
But there is an alternate path that can be equally impactful. By leveraging Automated Prompt Optimization (APO) on Vertex AI, developers can achieve quality approaching fine-tuning, all while working seamlessly within the native Android execution environment. By focusing on superior system instruction, APO enables developers to tailor model behavior with greater robustness and scalability than traditional fine-tuning solutions.
Note: Gemini Nano V3 is a quality optimized version of the highly acclaimed Gemma 3N model. Any prompt optimizations that are made on the open source Gemma 3N model will apply to Gemini Nano V3 as well. On supported devices, ML Kit GenAI APIs leverage the nano-v3 model to maximize the quality for Android Developers.To further help bring your ML Kit Prompt API use cases to production, we are excited to announce Automated Prompt Optimization (APO) targeting On-Device models on Vertex AI. Automated Prompt Optimization is a tool that helps you automatically find the optimal prompt for your use cases.
The era of On-Device AI is no longer a promise—it is a production reality. With the release of Gemini Nano v3, we are placing unprecedented language understanding and multimodal capabilities directly into the palms of users. Through the Gemini Nano family of models, we have wide coverage of supported devices across the Android Ecosystem. But for developers building the next generation of intelligent apps, access to a powerful model is only step one. The real challenge lies in customization: How do you tailor a foundation model to expert-level performance for your specific use case without breaking the constraints of mobile hardware?
In the server-side world, the larger LLMs tend to be highly capable and require less domain adaptation. Even when needed, more advanced options such as LoRA (Low-Rank Adaptation) fine-tuning can be feasible options. However, the unique architecture of Android AICore prioritizes a shared, memory-efficient system model. This means that deploying custom LoRA adapters for every individual app comes with challenges on these shared system services.
But there is an alternate path that can be equally impactful. By leveraging Automated Prompt Optimization (APO) on Vertex AI, developers can achieve quality approaching fine-tuning, all while working seamlessly within the native Android execution environment. By focusing on superior system instruction, APO enables developers to tailor model behavior with greater robustness and scalability than traditional fine-tuning solutions.
Note: Gemini Nano V3 is a quality optimized version of the highly acclaimed Gemma 3N model. Any prompt optimizations that are made on the open source Gemma 3N model will apply to Gemini Nano V3 as well. On supported devices, ML Kit GenAI APIs leverage the nano-v3 model to maximize the quality for Android Developers
APO treats the prompt not as a static text, but as a programmable surface that can be optimized. It leverages server-side models (like Gemini Pro and Flash) to propose prompts, evaluate variations and find the optimal one for your specific task. This process employs three specific technical mechanisms to maximize performance:
Automated Error Analysis: APO analyzes error patterns from training data to Automatically identify specific weaknesses in the initial prompt.
Semantic Instruction Distillation: It analyzes massive training examples to distill the "true intent" of a task, creating instructions that more accurately reflect the real data distribution.
Parallel Candidate Testing: Instead of testing one idea at a time, APO generates and tests numerous prompt candidates in parallel to identify the global maximum for quality.
It is a common misconception that fine-tuning always yields better quality than prompting. For modern foundation models like Gemini Nano v3, prompt engineering can be impactful by itself:
Preserving General capabilities: Fine-tuning ( PEFT/LoRA) forces a model's weights to over-index on a specific distribution of data. This often leads to "catastrophic forgetting," where the model gets better at your specific syntax but worse at general logic and safety. APO leaves the weights untouched, preserving the capabilities of the base model.
Instruction Following & Strategy Discovery: Gemini Nano v3 has been rigorously trained to follow complex system instructions. APO exploits this by finding the exact instruction structure that unlocks the model's latent capabilities, often discovering strategies that might be hard for human engineers to find.
To validate this approach, we evaluated APO across diverse production workloads. Our validation has shown consistent 5-8% accuracy gains across various use cases.Across multiple deployed on-device features, APO provided significant quality lifts.
Use Case | Task Type | Task Description | Metric | APO Improvement |
Topic classification | Text classification | Classify a news article into topics such as finance, sports, etc | Accuracy | +5% |
Intent classification | Text classification | Classify a customer service query into intents | Accuracy | +8.0% |
Webpage translation | Text translation | Translate a webpage from English to a local language | BLEU | +8.57% |
It is a common misconception that fine-tuning always yields better quality than prompting. For modern foundation models like Gemini Nano v3, prompt engineering can be impactful by itself:
Preserving General capabilities: Fine-tuning ( PEFT/LoRA) forces a model's weights to over-index on a specific distribution of data. This often leads to "catastrophic forgetting," where the model gets better at your specific syntax but worse at general logic and safety. APO leaves the weights untouched, preserving the capabilities of the base model.
Instruction Following & Strategy Discovery: Gemini Nano v3 has been rigorously trained to follow complex system instructions. APO exploits this by finding the exact instruction structure that unlocks the model's latent capabilities, often discovering strategies that might be hard for human engineers to find.
To validate this approach, we evaluated APO across diverse production workloads. Our validation has shown consistent 5-8% accuracy gains across various use cases.Across multiple deployed on-device features, APO provided significant quality lifts.
The release of Automated Prompt Optimization (APO) marks a turning point for on-device generative AI. By bridging the gap between foundation models and expert-level performance, we are giving developers the tools to build more robust mobile applications. Whether you are just starting with Zero-Shot Optimization or scaling to production with Data-Driven refinement, the path to high-quality on-device intelligence is now clearer. Launch your on-device use cases to production today with ML Kit’s Prompt API and Vertex AI’s Automated Prompt Optimization.
Relevant links: