20 May 2025
We are excited to announce a set of on-device GenAI APIs, as part of ML Kit, to help you integrate Gemini Nano in your Android apps.
To start, we are releasing 4 new APIs:
GenAI APIs are high level APIs that allow for easy integration, similar to existing ML Kit APIs. This means you can expect quality results out of the box without extra effort for prompt engineering or fine tuning for specific use cases.
GenAI APIs run on-device and thus provide the following benefits:
To prevent misuse, we also added safety protection in various layers, including base model training, safety-aware LoRA fine-tuning, input and output classifiers and safety evaluations.
There are 4 main components that make up each of the GenAI APIs.
Together, these components make up the high-level GenAI APIs that simplify the effort needed to integrate Gemini Nano in your Android app.
For each API, we formulate a benchmark score based on the evaluation pipeline mentioned above. This score is based on attributes specific to a task. For example, when evaluating the summarization task, one of the attributes we look at is “grounding” (ie: factual consistency of generated summary with source content).
To provide out-of-box quality for GenAI APIs, we applied feature specific fine-tuning on top of the Gemini Nano base model. This resulted in an increase for the benchmark score of each API as shown below:
Use case in English | Gemini Nano Base Model | ML Kit GenAI API |
---|---|---|
Summarization | 77.2 | 92.1 |
Proofreading | 84.3 | 90.2 |
Rewriting | 79.5 | 84.1 |
Image Description | 86.9 | 92.3 |
In addition, this is a quick reference of how the APIs perform on a Pixel 9 Pro:
Prefix Speed (input processing rate) |
Decode Speed (output generation rate) |
|
---|---|---|
Text-to-text | 510 tokens/second | 11 tokens/second |
Image-to-text | 510 tokens/second + 0.8 seconds for image encoding | 11 tokens/second |
This is an example of implementing the GenAI Summarization API to get a one-bullet summary of an article:
val articleToSummarize = "We are excited to announce a set of on-device generative AI APIs..." // Define task with desired input and output format val summarizerOptions = SummarizerOptions.builder(context) .setInputType(InputType.ARTICLE) .setOutputType(OutputType.ONE_BULLET) .setLanguage(Language.ENGLISH) .build() val summarizer = Summarization.getClient(summarizerOptions) suspend fun prepareAndStartSummarization(context: Context) { // Check feature availability. Status will be one of the following: // UNAVAILABLE, DOWNLOADABLE, DOWNLOADING, AVAILABLE val featureStatus = summarizer.checkFeatureStatus().await() if (featureStatus == FeatureStatus.DOWNLOADABLE) { // Download feature if necessary. // If downloadFeature is not called, the first inference request will // also trigger the feature to be downloaded if it's not already // downloaded. summarizer.downloadFeature(object : DownloadCallback { override fun onDownloadStarted(bytesToDownload: Long) { } override fun onDownloadFailed(e: GenAiException) { } override fun onDownloadProgress(totalBytesDownloaded: Long) {} override fun onDownloadCompleted() { startSummarizationRequest(articleToSummarize, summarizer) } }) } else if (featureStatus == FeatureStatus.DOWNLOADING) { // Inference request will automatically run once feature is // downloaded. // If Gemini Nano is already downloaded on the device, the // feature-specific LoRA adapter model will be downloaded very // quickly. However, if Gemini Nano is not already downloaded, // the download process may take longer. startSummarizationRequest(articleToSummarize, summarizer) } else if (featureStatus == FeatureStatus.AVAILABLE) { startSummarizationRequest(articleToSummarize, summarizer) } } fun startSummarizationRequest(text: String, summarizer: Summarizer) { // Create task request val summarizationRequest = SummarizationRequest.builder(text).build() // Start summarization request with streaming response summarizer.runInference(summarizationRequest) { newText -> // Show new text in UI } // You can also get a non-streaming response from the request // val summarizationResult = summarizer.runInference(summarizationRequest) // val summary = summarizationResult.get().summary } // Be sure to release the resource when no longer needed // For example, on viewModel.onCleared() or activity.onDestroy() summarizer.close()
For more examples of implementing the GenAI APIs, check out the official documentation and samples on GitHub:
Here is some guidance on how to best use the current GenAI APIs:
For Summarization, consider:
For Proofreading and Rewriting APIs, consider utilizing them during the content creation process for short content below 256 tokens to help with tasks such as:
For the Image Description API, consider it for:
Envision is an app that verbalizes the visual world to help people who are blind or have low vision lead more independent lives. A common use case in the app is for users to take a picture to have a document read out loud. Utilizing the GenAI Summarization API, Envision is now able to get a concise summary of a captured document. This significantly enhances the user experience by allowing them to quickly grasp the main points of documents and determine if a more detailed reading is desired, saving them time and effort.
GenAI APIs are available on Android devices using optimized MediaTek Dimensity, Qualcomm Snapdragon, and Google Tensor platforms through AICore. For a comprehensive list of devices that support GenAI APIs, refer to our official documentation.
Start implementing GenAI APIs in your Android apps today with guidance from our official documentation and samples on GitHub: AI Catalog GenAI API Samples with Compose, ML Kit GenAI APIs Quickstart.