05 March 2024

Introducing a new Text-To-Speech engine on Wear OS

Posted by Ouiam Koubaa – Product Manager and Yingzhe Li – Software Engineer

Today, we’re excited to announce the release of a new Text-To-Speech (TTS) engine that is performant and reliable. Text-to-speech turns text into natural-sounding speech across more than 50 languages powered by Google’s machine learning (ML) technology. The new text-to-speech engine on Wear OS uses smaller and more efficient prosody ML models to bring faster synthesis on Wear OS devices.

Use cases for Wear OS’s text-to-speech can range from accessibility services, coaching cues for exercise apps, navigation cues, and reading aloud incoming alerts through the watch speaker or Bluetooth connected headphones. The engine is meant for brief interactions, so it shouldn’t be used for reading aloud a long article, or a long summary of a podcast.

How to use Wear OS’s TTS

Text-to-speech has long been supported on Android. Wear OS’s new TTS has been tuned to be performant and reliable on low-memory devices. All the Android APIs are still the same, so developers use the same process to integrate it into a Wear OS app, for example, TextToSpeech#speak can be used to speak specific text. This is available on devices that run Wear OS 4 or higher.

When the user interacts with the Wear OS TTS for the first time following a device boot, the synthesis engine is ready in about 10 seconds. For special cases where developers want the watch to speak immediately after opening an app or launching an experience, the following code can be used to pre-warm the TTS engine before any synthesis requests come in.

private fun initTtsEngine() {
    // Callback when TextToSpeech connection is set up
    val callback = TextToSpeech.OnInitListener { status ->
        if (status == TextToSpeech.SUCCESS) {
            Log.i(TAG, "tts Client Initialized successfully")


            // Get default TTS locale
            val defaultVoice = tts.voice
            if (defaultVoice == null) {
                Log.w(TAG, "defaultVoice == null")
                return@OnInitListener
            }


            // Set TTS engine to use default locale
            tts.language = defaultVoice.locale




            try {
                // Create a temporary file to synthesize sample text
                val tempFile =
                        File.createTempFile("tmpsynthesize", null, applicationContext.cacheDir)


                // Synthesize sample text to our file
                tts.synthesizeToFile(
                        /* text= */ "1 2 3", // Some sample text
                        /* params= */ null, // No params necessary for a sample request
                        /* file= */ tempFile,
                        /* utteranceId= */ "sampletext"
                )


                // And clean up the file
                tempFile.deleteOnExit()
            } catch (e: Exception) {
                Log.e(TAG, "Unhandled exception: ", e)
            }
        }
    }


    tts = TextToSpeech(applicationContext, callback)
}

When you are done using TTS, you can release the engine by calling tts.shutdown() in your activity’s onDestroy() method. This command should also be used when closing an app that TTS is used for.

Languages and Locales

By default, Wear OS TTS includes 7 pre-loaded languages in the system image: English, Spanish, French, Italian, German, Japanese, and Mandarin Chinese. OEMs may choose to preload a different set of languages. You can check what languages are available by using TextToSpeech#getAvailableLanguages(). During watch setup, if the user selects a system language that is not a pre-loaded voice file, the watch automatically downloads the corresponding voice file the first time the user connects to Wi-Fi while charging their watch.

There are limited cases where the speech output may differ from the user’s system language. For example, in a scenario where a safety app uses TTS to call emergency responders, developers might want to synthesize speech in the language of the locale the user is in, not in the language the user has their watch set to. To synthesize text in a different language from system settings, use TextToSpeech#setLanguage(java.util.Locale)

Conclusion

Your Wear OS apps now have the power to talk, either directly from the watch’s speakers or through Bluetooth connected headphones. Learn more about using TTS.

We look forward to seeing how you use Text-to-speech engine to create more helpful and engaging experiences for your users on Wear OS!

Copyright 2023 Google LLC.
SPDX-License-Identifier: Apache-2.0

Announcements Explore latest Platform Text-to-Speech Wear OS

Introducing a new Text-To-Speech engine on Wear OS

How to use Wear OS’s TTS

Languages and Locales

Conclusion

Google developers blog

Connect

Subscribe

Introducing a new Text-To-Speech engine on Wear OS

How to use Wear OS’s TTS

Languages and Locales

Conclusion

Google developers blog

Connect

Subscribe

Feed

Newsletter