17 giugno 2026

At this year's Google I/O, we announced an update for spatial experiences: the Geospatial API is now available as a preview in ARCore for Jetpack XR. By bringing Google's Visual Positioning System (VPS) to Android XR, Android XR enables anchoring digital content to the physical world with sub-meter accuracy and precise orientation in supported areas.* To explore what the Geospatial API could unlock, our team built a demo: the XR Geospatial Tour.
Imagine walking into a new city, putting on a pair of wired XR glasses (like the upcoming XREAL Project Aura), and instantly having a knowledgeable, local guide showing you around. You don't need to stare down at a 2D map—instead, 3D models gently guide your path, and an intelligent voice tells you about the historical landmarks right in front of you. We combined the Geospatial APIs, Gemini API using Firebase AI Logic, Google Maps Grounding, and Jetpack XR SDK to create a hands-free, immersive walking tour experience.
*Disclaimer: Video and Tour Guide application are for demonstration purposes only. Some sequences have been shortened. Any hardware depicted may be under development; final product details may differ.
Let’s walk through the implementation details and show how we tied these APIs together to build a world-scale spatial experience.
Enhance your navigation experience on XR by combining the power of GPS with the precision of VPS. The accuracy and precise orientation that comes with VPS allows 3D waypoints to align with the physical world.
This is why the Geospatial API on Android XR can help you build custom experiences. By using advanced computer vision, VPS tries to provide a GeospatialPose (including latitude, longitude, and heading) that is more accurate than GPS.
Here's how we retrieve the user's Geospatial pose by mapping the device's orientation to a Geospatial coordinate:
// Retrieve the current geospatial pose from the ARCore sessionval result = geospatial.createGeospatialPoseFromPose(arDevice.state.value.devicePose)if (result is CreateGeospatialPoseFromPoseSuccess) {val pose = result.poseLog.d("VPS", "Accurate Location: ${pose.latitude}, ${pose.longitude}")}
Because the entire experience relies on this accuracy, we monitor the horizontalAccuracy and orientationYawAccuracy until they meet our thresholds. If the user is indoors or in an unrecognized area, we prompt them to "walk to an outdoor public space and look around".
Once we have a location, we use the Gemini API using Firebase AI Logic to prompt the Gemini model to act as a local tour guide. We pass the user's coordinates to the model and ask it to output a structured JSON response containing nearby walking tours:
val configForTools = ToolConfig(functionCallingConfig = null,retrievalConfig = retrievalConfig {latLng = FirebaseLatLng(pose.latitude, pose.longitude)languageCode = "en"})val responseJsonSchema = Schema.obj(mapOf("locationIntro" to Schema.string(),"tours" to Schema.array(Schema.obj(mapOf("title" to Schema.string(),"description" to Schema.string(),"stops" to Schema.array(Schema.obj(mapOf("name" to Schema.string(),"detailedName" to Schema.string(),"description" to Schema.string()))))))))val model = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(modelName = "gemini-3.5-flash",tools = listOf(Tool.googleMaps()),generationConfig = generationConfig {responseMimeType = "application/json"responseSchema = responseJsonSchema})val result = model.generateContent("The user is at latitude ${pose.latitude} and longitude ${pose.longitude}. Generate exactly 3 diverse tours near this location (e.g., historical, food, nature). All tour ideas should be walking distance only.")
Large Language Models are great at generating rich descriptions, but they can sometimes hallucinate exact latitude/longitude coordinates. To solve this, we used Google Maps Grounding to ground the AI.
To make the tour guide feel truly present, we implemented dynamic voiceovers.
Using the gemini-2.5-flash-tts model, we can configure our model generation config to natively return audio data instead of just text! Here’s how you can request the ResponseModality.AUDIO:
val ttsModel = Firebase.ai(backend = GenerativeBackend.googleAI()).generativeModel(modelName = "gemini-2.5-flash-tts",generationConfig = generationConfig {// Instruct the model to return AudioresponseModalities = listOf(ResponseModality.AUDIO)})val response = ttsModel.generateContent("Say in a neutral but positive voice:\n$prompt")// Extract the raw audio bytes from the responseval audioBytes = response.candidates.firstOrNull()?.content?.parts?.filterIsInstance<InlineDataPart>()?.firstOrNull { it.mimeType.contains("audio") }?.inlineData
The final piece of the puzzle is rendering this data in the user's field of view. The Jetpack XR SDK makes it intuitive to transition from a 2D Android UI to spatial computing.
We used Jetpack Compose for XR to build spatial components. To represent points of interest along the tour, we built a Composable called InfoSphere, which contains a GltfModel of a 3D orb that floats in space and can be interacted with to reveal information.
Using Jetpack XR SDK, we can place 3D models alongside the Compose UI using SpatialBox and SceneCoreEntity. We also used InteractableComponent to respond to user taps.
@Composablefun InfoSphere(content: InfoBubbleContent,session: Session,sphereModel: GltfModel,isSelected: Boolean,onClick: () -> Unit) {// SpatialBox lets us arrange 3D components and SpatialPanels togetherSpatialBox(SubspaceModifier.offset(x = 2.dp, y = 1.dp, z = (-3).dp) // Positioned in 3D space) {// Smoothly animate the visibility of our 2D Compose UI PanelAnimatedSpatialVisibility(visible = isSelected) {SpatialPanel {InfoBubble(content) // Regular 2D Compose UI}}// Render our interactive 3D sphereSceneCoreEntity(factory = {GltfModelEntity.create(session, sphereModel).also { entity ->// Make the 3D model respond to user tapsentity.addComponent(InteractableComponent.create(session) { inputEvent ->if (inputEvent.action == InputEvent.Action.UP) {onClick()}})}})}}
By combining AnimatedSpatialVisibility for traditional Compose UI surfaces with SceneCoreEntity 3D elements, we're able to seamlessly blend data into the physical world.
Building the XR Geospatial Tour app showed us that the barrier to entry for world-scale spatial experiences is lower than ever for Android developers. With the Geospatial API now available in preview on Android XR, your apps can seamlessly understand the physical world around them. By combining Compose for XR’s APIs with the high-precision location data of VPS and the generative capabilities of Gemini, we can create experiences that understand both where the user is and what they are looking at.
To help you get hands-on with Android XR, we are thrilled to open applications for the Android XR Developer Catalyst Program, which includes XREAL Project Aura. Starting today, you can apply to get access to an XREAL Project Aura devkit or our display glasses devkit over the coming months!