Google Just Turned Street View Into a Video Game

Reddit r/singularity 05/20/26, 07:56 AM Products

google-maps street-view genie-3 video-generation interactive-3d real-time ai-gaming

Summary

Google’s Genie 3 real-time video generator, unveiled at the IO conference, transforms Google Maps Street View imagery into an interactive 3D world, letting users explore real-world locations freely as if playing a game.

Could this be how GTA 7 will be made given that it is probably almost 20 years away?

Original Article

View Cached Full Text

Cached at: 05/20/26, 10:29 AM

### TL;DR Google's Genie 3, a real-time video generator unveiled at IO, can use Google Maps Street View images as a base map to generate interactive 3D worlds, letting users freely explore real-world scenes like in a video game. --- ## From Holodeck to Reality: Genie 3 Brings Street View to Life I've always wanted a holodeck in the real world — a way to drop complex reality into a simulator and do whatever you want inside it. Google's Genie 3 announcement at IO is a step in that direction. Genie 3 is Google's real-time video generator, and now you can anchor it to Google Maps imagery. That means you can reference actual Street View photos of a physical area and use them as the foundation for generated content. I got early access, so let me walk you through how it works with a few different examples. Most importantly, let's look at where this technology is heading — because I believe it gives us a clear preview of what interactive media anchored in the physical world will look like. --- ## How It Works: Pick a Location, Real-Time Generation Here's the Genie 3 interface. There's a button that says "Select a location from Google Maps." The idea is simple: the system references real-world panoramic images, then uses an autoregressive video model to generate the next frame instantly, creating seamless interaction. --- ## Demo Examples: From GTA Racing to a Raccoon on a Scooter ### GTA-Themed F1 Car The first thing I wanted to do was simulate something like GTA. Maybe GTA 7 will actually look like this, right? I basically prompted the system to generate a Google Maps-themed F1 car and then race down the Las Vegas Strip. You can see it has a speedometer. The system even has checkpoints built in. Pretty wild. What's especially cool is that it's just referencing a panoramic image. They haven't even incorporated aerial imagery yet. Once that's included, you'll be able to generate a realistic, almost one-to-one representation of a real location and navigate freely within it. ### Raccoon on a Scooter In this example, I prompted a raccoon riding a scooter around the Palace of Fine Arts. Look over there — that's the Palace of Fine Arts. You can see the thing moving fast, with shadows being captured. What's cool is that this isn't even Gaussian Splatting, right? Google doesn't need to convert it to Gaussian Splatting. It's just the panoramic image plus an autoregressive video model trained on tons of YouTube videos. It generates everything on the fly for you. ### Flying Inside the Palace of Fine Arts I wanted to try another animal, jumping around inside the Palace of Fine Arts. So I picked another panorama to basically let me fly around. ### Pegman Running As a former Google Maps employee, I had to pay homage by making Pegman run around the Ferry Building. You can see how cool this looks. Again, this is real-time video generation, right? If you want to prototype an idea — say, something you plan to actually build later in a game engine or with Gaussian Splatting — you can try it here first and get a feel for what it looks like. ### Ladybird Lake, Austin I now live in Austin, Texas, so I had to try recreating a very familiar scene: a bunch of tattooed guys running around Ladybird Lake. I think this turned out particularly well because, amusingly, there's the Google building over there. When I pan over to show you the skyline, you can see how realistic it actually is. Of course, one thing I wanted to do — Ladybird Lake is a bit dirty. If you live in Austin, you generally don't want to jump in. But maybe as an avatar, I can. So now we're in the water, and I wanted to try driving this boat on Ladybird Lake. The cool thing is, Google has many specialized captures. This one was taken from a boat on the lake. So it's exactly what it looks like under that bridge. ### Indoor Spaces and the White House But it's not just outdoor captures. Street View also has plenty of indoor captures. This is the White House. I'm actually walking around inside the White House. ### Non-Realistic Creative Play And by the way, you don't have to go for realism, right? If you want to, say, put the Golden Gate Bridge underwater and suddenly become a diver, you can do that. If you want to imagine what a city would look like covered in heavy snow, you can do that too. ### Historical Imagery Just as I showed in my earlier deep dive into Genie 3, you can do a lot by anchoring content layers on top of the real world — like even using very old historical captures. For example, this aerial photo of San Francisco, and then you can fly through it. --- ## Technical Analysis: Why This Isn't Just a Game You might be thinking, what does this have to do with me? Well, "world model" is a vague term right now, because everyone claims they're building one. There's the 3D Gaussian Splatting crowd. Obviously, there's the old-school SLAM computer vision camp. There's the JEPA crowd arguing with the LLM crowd. And of course, there's Google Maps. In this case, what we're seeing is essentially Google Maps and Video Gen having a baby. In fact, a few months ago, there was a paper called the "Soleworld Model" that basically did this. How? Take a video generation model — not a real-time autoregressive video generator, but a diffusion model — and condition it on Street View. The idea is that you have all these Street View images, and now you can use these video models to make everything move. The advantage, of course, is that you can now have free-roaming simulations, right? You're no longer limited to the path of the Street View capture vehicle. You can go far beyond what traditional 3D reconstruction can achieve. And since it's a video generator, you can also add crazy things — like Godzilla suddenly appearing, or a giant tsunami heading your way, or an alien portal opening. You can even turn day into night, right? So by pulling reality into the latent space, you can now edit it and do things that are difficult or tedious with traditional tools, all through text prompts, image references, and so on. --- ## Current Limitations and Future Directions I don't think what Google showed today is doing it this way under the hood, but I suspect that's where it's headed. For example, here you can see the nearest Street View panoramas. Say you're moving along a trajectory. The system retrieves the nearest panorama and keeps feeding it into the context. That way, the model knows what's physically around it and doesn't just make things up. Right now, it seems Google is just loading the nearest set of panoramas, which makes sense because this is real-time video generation, right? The model predicts the next frame autoregressively. Loading everything into context might be heavy, but I imagine that's the next step. That's why this is completely wrong here. When I walk to the other side of the Palace of Fine Arts — like this dome over here — there aren't houses behind it. If they used a similar approach, they would have loaded those panoramas. You might now be thinking, why not use actual Gaussian Splatting or something like World Labs to create a static scene and then place a 3D model inside? Yes, you could. Those pipelines are great when you need that level of control. But you might have noticed that in those scenes, everything is static, right? Meanwhile, the real world is full of life and motion. So when you combine the most advanced generative AI models with real-world imagery, you truly get the best of both worlds. I should also point out the current quality. Right now, Parker says these real-time video models are about one or two versions behind offline video models. So v3.1 will offer better generation quality. But we have interactivity now, right? That lets you dial in the specific camera angle you want, and you can always upscale it in other models later. --- ## Conclusion: Will GTA 7 Look Like This? Anyway, this is a bit different from my usual content, but it's one of those IO announcements I had to cover, especially since I got early access. Hope you enjoyed it. If you have thoughts on where this technology is going, I'd love to hear them. Do you think GTA 7 will actually be like this? Let me know in the comments below. --- **Source:** Google Just Turned Street View Into a Video Game – SnowmanRandom (https://youtube.com/watch?si=wgJErm9jJ8v4FTCt&v=bxv4IkobUPI)

Google Just Turned Street View Into a Video Game

Similar Articles

Google’s Genie world model can now simulate real streets with Street View

Simulate real-world places with Project Genie and Street View

@GoogleDeepMind: Street View imagery in Project Genie is rolling out to all eligible Google AI Ultra subscribers globally (18+). Try it …

Google's Genie 3 turns a text prompt into a playable open world you can explore. It's rough now. Future of games, or a tech demo?

Project Genie: Experimenting with infinite, interactive worlds

Submit Feedback

Similar Articles

Google’s Genie world model can now simulate real streets with Street View

Simulate real-world places with Project Genie and Street View

@GoogleDeepMind: Street View imagery in Project Genie is rolling out to all eligible Google AI Ultra subscribers globally (18+). Try it …

Google's Genie 3 turns a text prompt into a playable open world you can explore. It's rough now. Future of games, or a tech demo?

Project Genie: Experimenting with infinite, interactive worlds