State-of-the-art video and image generation with Veo 2 and Imagen 3

Google DeepMind Blog Models

Summary

Google announced Veo 2 and Imagen 3, state-of-the-art video and image generation models now available in VideoFX, ImageFX, and a new tool called Whisk. Veo 2 generates high-quality 4K videos with improved physics understanding and cinematography knowledge, while Imagen 3 produces brighter, better-composed images with diverse art styles.

We're rolling out a new, state-of-the-art video model, Veo 2, and updates to Imagen 3. Plus, check out our new experiment, Whisk.
Original Article
View Cached Full Text

Cached at: 04/20/26, 08:36 AM

# State-of-the-art video and image generation with Veo 2 and Imagen 3 Source: https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/ We're announcing new versions of Veo and Imagen, and introducing our latest experiment in image generation: Whisk. Elias Roman Senior Director, Product Management, Google Labs ## General summary Google has released updated versions of its video and image generation models, Veo 2 and Imagen 3. These models are now available in Google Labs tools, VideoFX and ImageFX, and a new tool called Whisk. Veo 2 generates high-quality videos with improved realism and understanding of cinematography, while Imagen 3 produces brighter, better composed images with more diverse art styles. Summaries were generated by Google AI. Generative AI is experimental. Three different AI generated images in front of an abstract background Earlier this year (https://blog.google/technology/ai/google-generative-ai-veo-imagen-3/), we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it's been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds (https://www.youtube.com/watch?v=HO-Z5kO8scA) for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/introducing-veo-and-imagen-3-on-vertex-ai) and creatives are using VideoFX (https://labs.google/fx/tools/video-fx) and ImageFX (https://labs.google/fx/tools/image-fx) to tell their stories. Together with collaborators ranging from filmmakers to businesses, we're continuing to develop and evolve these technologies. Today we're introducing a new video model, Veo 2, and the latest version of Imagen 3, both of which achieve state-of-the-art results. These models are now available in VideoFX, ImageFX and our newest Labs experiment, Whisk (https://labs.google/fx/tools/whisk). ## Veo 2: state-of-the-art video generation Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. In head-to-head comparisons judged by human raters, Veo 2 achieved state-of-the-art results (https://deepmind.google/technologies/veo/veo-2) against leading models. It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver — at resolutions up to 4K, and extended to minutes in length. Ask for a low-angle tracking shot that glides through the middle of a scene, or a close-up shot on the face of a scientist looking through her microscope, and Veo 2 creates it. Suggest "18mm lens" in your prompt and Veo 2 knows to craft the wide angle shot that this lens is known for, or blur out the background and focus on your subject by putting "shallow depth of field" in your prompt. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified. While video models often "hallucinate" unwanted details — extra fingers or unexpected objects, for example — Veo 2 produces these less frequently, making outputs more realistic. Our commitment to safety and responsible development has guided Veo 2. We have been intentionally measured in growing Veo's availability, so we can help identify, understand and improve the model's quality and safety while slowly rolling it out via VideoFX, YouTube and Vertex AI. Just like the rest of our image and video generation models, Veo 2 outputs include an invisible SynthID watermark that helps identify them as AI-generated, helping reduce the chances of misinformation and misattribution. Today, we're bringing our new Veo 2 capabilities to our Google Labs video generation tool, VideoFX, and expanding the number of users who can access it. Visit Google Labs (https://labs.google/fx/tools/video-fx) to sign up for the waitlist. We also plan to expand Veo 2 to YouTube Shorts and other products next year. *Note: Find prompts for all videos at the bottom of this post: Scientist¹ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-1), Cartoon character² (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-2), Bees³ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-3), Flamingos⁴ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-4), Cube⁵ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-5), Dog⁶ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-6), Pancakes⁷ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-7)* ## Imagen 3: state-of-the-art image generation We've also improved our Imagen 3 (https://deepmind.google/technologies/imagen-3/) image-generation model, which now generates brighter, better composed images. It can now render more diverse art styles with greater accuracy — from photorealism to impressionism, from abstract to anime. This upgrade also follows prompts more faithfully, and renders richer details and textures. In side-by-side comparisons of outputs by human raters against leading image generation models, Imagen 3 achieved state-of-the-art results (https://deepmind.google/technologies/imagen-3/). Starting today, the latest Imagen 3 model will globally roll out in ImageFX, our image generation tool from Google Labs, to more than 100 countries. Visit ImageFX (https://labs.google/fx/tools/image-fx) to get started. A close-up shot captures a winter wonderland scene – soft snowflakes fall on a snow-covered forest floor. Behind a frosted pine branch, a red squirrel sits, its bright orange fur a splash of color against the white. It holds a small hazelnut. As it enjoys its meal, it seems oblivious to the falling snow. Examples of Imagen 3's rich detail and image quality composition An extreme close-up of a craftsperson's hands shaping a glowing piece of pottery on a wheel. Threads of golden, luminous energy connect the potter's hands to the clay, swirling dynamically with their movements. Examples of Imagen 3's rich detail and image quality composition A foggy 1940s European train station at dawn, framed by intricate wrought-iron arches and misted glass windows. Steam rises from the tracks, blending with dense fog. Two lovers stand in an emotional embrace near the train, backlit by the warm, amber glow of dim lanterns. The departing train is partially visible, its red tail lights fading into the mist. The woman wears a faded red coat and clutches a small leather diary, while the man is dressed in a weathered soldier's uniform. Dust motes float in the air, illuminated by the soft golden backlight. The atmosphere is melancholic and timeless, evoking the bittersweet farewell of wartime cinema. Examples of Imagen 3's rich detail and image quality composition A portrait of an Asian woman with neon green lights in the background, shallow depth of field. Examples of Imagen 3's rich detail and image quality composition A close-up, macro photography stock photo of a strawberry intricately sculpted into the shape of a hummingbird in mid-flight, its wings a blur as it sips nectar from a vibrant, tubular flower. The backdrop features a lush, colorful garden with a soft, bokeh effect, creating a dreamlike atmosphere. The image is exceptionally detailed and captured with a shallow depth of field, ensuring a razor-sharp focus on the strawberry-hummingbird and gentle fading of the background. The high resolution, professional photographers style, and soft lighting illuminate the scene in a very detailed manner, professional color grading amplifies the vibrant colors and creates an image with exceptional clarity. The depth of field makes the hummingbird and flower stand out starkly against the bokeh background. Examples of Imagen 3's rich detail and image quality composition *Note: Find prompts for all images at the bottom of this post: Potter⁸ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-8), Squirrel⁹ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-9), Train station¹⁰ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-10), Woman¹¹ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-11), Strawberry bird¹² (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-12)* ## Whisk: a fun new tool that lets you prompt with images to visualize your ideas Whisk (https://labs.google/fx/tools/whisk), our newest experiment from Google Labs, lets you input or create images that convey the subject, scene and style you have in mind. Then, you can bring them together and remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker. Under the hood, Whisk combines our latest Imagen 3 model with Gemini's visual understanding and description capabilities. The Gemini model automatically writes a detailed caption of your images, and it then feeds those descriptions into Imagen 3. This process allows you to easily remix your subjects, scenes and styles in fun, new ways. Whisk is launching in the U.S. today. Read more about Whisk (https://blog.google/technology/google-labs/whisk) and try it out at labs.google/Whisk (https://labs.google/fx/tools/whisk). ### Related stories

Similar Articles

Fuel your creativity with new generative media models and tools

Google DeepMind Blog

Google announces Veo 3 and Imagen 4, next-generation video and image generation models with significant capability improvements including audio generation and enhanced physics simulation. The company also introduces Flow, an AI filmmaking tool, and expands access to Lyria 2 for music creation.

Introducing Veo 3.1 and advanced creative capabilities

Google DeepMind Blog

Google introduces Veo 3.1, an upgraded video generation model with richer audio, improved narrative control, and enhanced realism, alongside significant updates to Flow with new editing capabilities including Insert and Remove features, plus audio support across all existing tools.

Generate videos in Gemini and Whisk with Veo 2

Google DeepMind Blog

Google launches Veo 2 video generation capabilities in Gemini Advanced and Whisk, enabling users to create high-resolution 8-second videos from text prompts or animate images, available to Google One AI Premium subscribers.

Build with Veo 3.1 Lite, our most cost-effective video generation model

Google AI Blog

Google releases Veo 3.1 Lite, a cost-effective video generation model available on the Gemini API with 50% lower cost than Veo 3.1 Fast while maintaining the same speed. The model supports text-to-video and image-to-video generation with flexible resolutions and aspect ratios.