Cached at:
04/20/26, 08:36 AM
# State-of-the-art video and image generation with Veo 2 and Imagen 3
Source: https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/
We're announcing new versions of Veo and Imagen, and introducing our latest experiment in image generation: Whisk.
Elias Roman
Senior Director, Product Management, Google Labs
## General summary
Google has released updated versions of its video and image generation models, Veo 2 and Imagen 3. These models are now available in Google Labs tools, VideoFX and ImageFX, and a new tool called Whisk. Veo 2 generates high-quality videos with improved realism and understanding of cinematography, while Imagen 3 produces brighter, better composed images with more diverse art styles.
Summaries were generated by Google AI. Generative AI is experimental.
Three different AI generated images in front of an abstract background
Earlier this year (https://blog.google/technology/ai/google-generative-ai-veo-imagen-3/), we introduced our video generation model, Veo, and our latest image generation model, Imagen 3. Since then, it's been exciting to watch people bring their ideas to life with help from these models: YouTube creators are exploring the creative possibilities of video backgrounds (https://www.youtube.com/watch?v=HO-Z5kO8scA) for their YouTube Shorts, enterprise customers are enhancing creative workflows on Vertex AI (https://cloud.google.com/blog/products/ai-machine-learning/introducing-veo-and-imagen-3-on-vertex-ai) and creatives are using VideoFX (https://labs.google/fx/tools/video-fx) and ImageFX (https://labs.google/fx/tools/image-fx) to tell their stories. Together with collaborators ranging from filmmakers to businesses, we're continuing to develop and evolve these technologies.
Today we're introducing a new video model, Veo 2, and the latest version of Imagen 3, both of which achieve state-of-the-art results. These models are now available in VideoFX, ImageFX and our newest Labs experiment, Whisk (https://labs.google/fx/tools/whisk).
## Veo 2: state-of-the-art video generation
Veo 2 creates incredibly high-quality videos in a wide range of subjects and styles. In head-to-head comparisons judged by human raters, Veo 2 achieved state-of-the-art results (https://deepmind.google/technologies/veo/veo-2) against leading models.
It brings an improved understanding of real-world physics and the nuances of human movement and expression, which helps improve its detail and realism overall. Veo 2 understands the unique language of cinematography: ask it for a genre, specify a lens, suggest cinematic effects and Veo 2 will deliver — at resolutions up to 4K, and extended to minutes in length. Ask for a low-angle tracking shot that glides through the middle of a scene, or a close-up shot on the face of a scientist looking through her microscope, and Veo 2 creates it. Suggest "18mm lens" in your prompt and Veo 2 knows to craft the wide angle shot that this lens is known for, or blur out the background and focus on your subject by putting "shallow depth of field" in your prompt.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
Examples of Veo 2's high-quality video generation capabilities. All videos were generated by Veo 2 and have not been modified.
While video models often "hallucinate" unwanted details — extra fingers or unexpected objects, for example — Veo 2 produces these less frequently, making outputs more realistic.
Our commitment to safety and responsible development has guided Veo 2. We have been intentionally measured in growing Veo's availability, so we can help identify, understand and improve the model's quality and safety while slowly rolling it out via VideoFX, YouTube and Vertex AI.
Just like the rest of our image and video generation models, Veo 2 outputs include an invisible SynthID watermark that helps identify them as AI-generated, helping reduce the chances of misinformation and misattribution.
Today, we're bringing our new Veo 2 capabilities to our Google Labs video generation tool, VideoFX, and expanding the number of users who can access it. Visit Google Labs (https://labs.google/fx/tools/video-fx) to sign up for the waitlist. We also plan to expand Veo 2 to YouTube Shorts and other products next year.
*Note: Find prompts for all videos at the bottom of this post: Scientist¹ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-1), Cartoon character² (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-2), Bees³ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-3), Flamingos⁴ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-4), Cube⁵ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-5), Dog⁶ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-6), Pancakes⁷ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-7)*
## Imagen 3: state-of-the-art image generation
We've also improved our Imagen 3 (https://deepmind.google/technologies/imagen-3/) image-generation model, which now generates brighter, better composed images. It can now render more diverse art styles with greater accuracy — from photorealism to impressionism, from abstract to anime. This upgrade also follows prompts more faithfully, and renders richer details and textures. In side-by-side comparisons of outputs by human raters against leading image generation models, Imagen 3 achieved state-of-the-art results (https://deepmind.google/technologies/imagen-3/).
Starting today, the latest Imagen 3 model will globally roll out in ImageFX, our image generation tool from Google Labs, to more than 100 countries. Visit ImageFX (https://labs.google/fx/tools/image-fx) to get started.
A close-up shot captures a winter wonderland scene – soft snowflakes fall on a snow-covered forest floor. Behind a frosted pine branch, a red squirrel sits, its bright orange fur a splash of color against the white. It holds a small hazelnut. As it enjoys its meal, it seems oblivious to the falling snow.
Examples of Imagen 3's rich detail and image quality composition
An extreme close-up of a craftsperson's hands shaping a glowing piece of pottery on a wheel. Threads of golden, luminous energy connect the potter's hands to the clay, swirling dynamically with their movements.
Examples of Imagen 3's rich detail and image quality composition
A foggy 1940s European train station at dawn, framed by intricate wrought-iron arches and misted glass windows. Steam rises from the tracks, blending with dense fog. Two lovers stand in an emotional embrace near the train, backlit by the warm, amber glow of dim lanterns. The departing train is partially visible, its red tail lights fading into the mist. The woman wears a faded red coat and clutches a small leather diary, while the man is dressed in a weathered soldier's uniform. Dust motes float in the air, illuminated by the soft golden backlight. The atmosphere is melancholic and timeless, evoking the bittersweet farewell of wartime cinema.
Examples of Imagen 3's rich detail and image quality composition
A portrait of an Asian woman with neon green lights in the background, shallow depth of field.
Examples of Imagen 3's rich detail and image quality composition
A close-up, macro photography stock photo of a strawberry intricately sculpted into the shape of a hummingbird in mid-flight, its wings a blur as it sips nectar from a vibrant, tubular flower. The backdrop features a lush, colorful garden with a soft, bokeh effect, creating a dreamlike atmosphere. The image is exceptionally detailed and captured with a shallow depth of field, ensuring a razor-sharp focus on the strawberry-hummingbird and gentle fading of the background. The high resolution, professional photographers style, and soft lighting illuminate the scene in a very detailed manner, professional color grading amplifies the vibrant colors and creates an image with exceptional clarity. The depth of field makes the hummingbird and flower stand out starkly against the bokeh background.
Examples of Imagen 3's rich detail and image quality composition
*Note: Find prompts for all images at the bottom of this post: Potter⁸ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-8), Squirrel⁹ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-9), Train station¹⁰ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-10), Woman¹¹ (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-11), Strawberry bird¹² (https://blog.google/innovation-and-ai/models-and-research/google-labs/video-image-generation-update-december-2024/#footnote-12)*
## Whisk: a fun new tool that lets you prompt with images to visualize your ideas
Whisk (https://labs.google/fx/tools/whisk), our newest experiment from Google Labs, lets you input or create images that convey the subject, scene and style you have in mind. Then, you can bring them together and remix them to create something uniquely your own, from a digital plushie to an enamel pin or sticker.
Under the hood, Whisk combines our latest Imagen 3 model with Gemini's visual understanding and description capabilities. The Gemini model automatically writes a detailed caption of your images, and it then feeds those descriptions into Imagen 3. This process allows you to easily remix your subjects, scenes and styles in fun, new ways.
Whisk is launching in the U.S. today. Read more about Whisk (https://blog.google/technology/google-labs/whisk) and try it out at labs.google/Whisk (https://labs.google/fx/tools/whisk).
### Related stories