Veo - Google DeepMind

Veo - Google DeepMind

AI Plus
🎨Art
AI format
Google Veo
Created time
Jun 19, 2024 2:02 AM
Highly Recommend
Platform
Website
Posted By

Google DeepMind

image

Sign up to try on VideoFX

Veo is our most capable video generation model to date. It generates high-quality, 1080p resolution videos that can go beyond a minute, in a wide range of cinematic and visual styles.

It accurately captures the nuance and tone of a prompt, and provides an unprecedented level of creative control — understanding prompts for all kinds of cinematic effects, like time lapses or aerial shots of a landscape.

Our video generation model will help create tools that make video production accessible to everyone. Whether you're a seasoned filmmaker, aspiring creator, or educator looking to share knowledge, Veo unlocks new possibilities for storytelling, education and more.

Over the coming weeks some of these features will be available to select creators through VideoFX, a new experimental tool at labs.google. You can join the waitlist now.

In the future, we’ll also bring some of Veo’s capabilities to YouTube Shorts and other products.

Prompt: A lone cowboy rides his horse across an open plain at beautiful sunset, soft light, warm colors

Prompt: A fast-tracking shot down an suburban residential street lined with trees. Daytime with a clear blue sky. Saturated colors, high contrast

Prompt: Extreme close-up of chicken and green pepper kebabs grilling on a barbeque with flames. Shallow focus and light smoke. vivid colours

Prompt: Timelapse of the northern lights dancing across the Arctic sky, stars twinkling, snow-covered landscape

Prompt: An aerial shot of a lighthouse standing tall on a rocky cliff, its beacon cutting through the early dawn, waves crash against the rocks below

Greater understanding of language and vision

To produce a coherent scene, generative video models need to accurately interpret a text prompt and combine this information with relevant visual references.

With advanced understanding of natural language and visual semantics, Veo generates video that closely follows the prompt. It accurately captures the nuance and tone in a phrase, rendering intricate details within complex scenes.

Prompt: Many spotted jellyfish pulsating under water. Their bodies are transparent and glowing in deep ocean

Prompt: Timelapse of a common sunflower opening, dark background

Prompt: extreme close-up with a shallow depth of field of a puddle in a street. reflecting a busy futuristic Tokyo city with bright neon signs, night, lens flare

When given both an input video and editing command, like adding kayaks to an aerial shot of a coastline, Veo can apply this command to the initial video and create a new, edited video.

Prompt: Drone shot along the Hawaii jungle coastline, sunny day

Drone shot along the Hawaii jungle coastline, sunny day. Kayaks in the water

In addition, it supports masked editing, enabling changes to specific areas of the video when you add a mask area to your video and text prompt.

Veo can also generate a video with an image as input along with the text prompt. By providing a reference image in combination with a text prompt, it conditions Veo to generate a video that follows the image’s style and user prompt’s instructions.

Prompt: Alpacas wearing knit wool sweaters, graffiti background, sunglasses

image

Prompt: Alpacas dancing to the beat

The model is also able to make video clips and extend them to 60 seconds and beyond. It can do this either from a single prompt, or by being given a sequence of prompts which together tell a story.

image

Watch

Prompts:

A fast-tracking shot through a bustling dystopian sprawl with bright neon signs, flying cars and mist, night, lens flare, volumetric lighting.

A fast-tracking shot through a futuristic dystopian sprawl with bright neon signs, starships in the sky, night, volumetric lighting.

A neon hologram of a car driving at top speed, speed of light, cinematic, incredible details, volumetric lighting.

The cars leave the tunnel, back into the real world city Hong Kong.

Consistency across video frames

Maintaining visual consistency can be a challenge for video generation models. Characters, objects, or even entire scenes can flicker, jump, or morph unexpectedly between frames, disrupting the viewing experience.

Veo's cutting-edge latent diffusion transformers reduce the appearance of these inconsistencies, keeping characters, objects and styles in place, as they would in real life.

Prompt: A panning shot of a serene mountain landscape, the camera slowly revealing snow-capped peaks, granite rocks and a crystal-clear lake reflecting the sky

Prompt: moody shot of a central European alley film noir cinematic black and white high contrast high detail

Prompt: Crochet elephant in intricate patterns walking on the savanna

Built upon years of video generation research

Veo builds upon years of generative video model work including Generative Query Network (GQN), DVD-GAN, Imagen-Video, Phenaki, WALT, VideoPoet and Lumiere, and also our Transformer architecture and Gemini.

To help Veo understand and follow prompts more accurately, we have also added more details to the captions of each video in its training data. And to further improve performance, the model uses high-quality, compressed representations of video (also known as latents) so it’s more efficient too. These steps improve overall quality and reduce the time it takes to generate videos.

image

Responsible by design

It's critical to bring technologies like Veo to the world responsibly. Videos created by Veo are watermarked using SynthID, our cutting-edge tool for watermarking and identifying AI-generated content, and passed through safety filters and memorization checking processes that help mitigate privacy, copyright and bias risks.

Veo’s future will be informed by our work with leading creators and filmmakers. Their feedback helps us improve our generative video technologies and makes sure they benefit the wider creative community and beyond.

Note: All videos on this page were generated by Veo and have not been modified.

Acknowledgements

This work was made possible by the exceptional contributions of: Abhishek Sharma, Adams Yu, Ali Razavi, Andeep Toor, Andrew Pierson, Ankush Gupta, Austin Waters, Aäron van den Oord, Daniel Tanis, Dumitru Erhan, Eric Lau, Eleni Shaw, Gabe Barth-Maron, Greg Shaw, Han Zhang, Henna Nandwani, Hernan Moraldo, Hyunjik Kim, Irina Blok, Jakob Bauer, Jeff Donahue, Junyoung Chung, Kory Mathewson, Kurtis David, Lasse Espeholt, Marc van Zee, Matt McGill, Medhini Narasimhan, Miaosen Wang, Mikołaj Bińkowski, Mohammad Babaeizadeh, Mohammad Taghi Saffar, Nando de Freitas, Nick Pezzotti, Pieter-Jan Kindermans, Poorva Rane, Rachel Hornung, Robert Riachi, Ruben Villegas, Rui Qian, Sander Dieleman, Serena Zhang, Serkan Cabi, Shixin Luo, Shlomi Fruchter, Signe Nørly, Srivatsan Srinivasan, Tobias Pfaff, Tom Hume, Vikas Verma, Weizhe Hua, William Zhu, Xinchen Yan, Xinyu Wang, Yelin Kim, Yuqing Du and Yutian Chen.

We extend our gratitude to Aida Nematzadeh, Alex Cullum, Anja Hauth, April Lehman, Benigno Uria, Charlie Chen, Charlie Nash, Charline Le Lan, Conor Durkan, Cristian Țăpuș, David Bridson, David Ding, David Steiner, Emanuel Taropa, Evgeny Gladchenko, Frankie Garcia, Gavin Buttimore, Geng Yan, Greg Shaw, Hadi Hashemi, Harsha Vashisht, Hartwig Adam, Huisheng Wang, Jacob Austin, Jacob Kelly, Jacob Walker, Jim Lin, Jonas Adler, Joost van Amersfoort, Jordi Pont-Tuset, Josh V. Dillon, Josh Newlan, Junlin Zhang, Junwhan Ahn, Katie Zhang, Kelvin Xu, Kristian Kjems, Lois Zhou, Luis C. Cobo, Maigo Le, Malcolm Reynolds, Marcus Wainwright, Mary Cassin, Mateusz Malinowski, Matt Smart, Matt Young, Mingda Zhang, Minh Giang, Moritz Dickfeld, Nancy Xiao, Nelly Papalampidi, Nikhil Khadke, Nir Shabat, Oliver Woodman, Ollie Purkiss, Oskar Bunyan, Patrice Oehen, Pauline Luc, Pete Aykroyd, Petko Georgiev, Phil Chen, Rakesh Shivanna, Ramya Ganeshan, Richard Nguyen, RJ Mical, Robin Strudel, Rohan Anil, Sam Haves, Shanshan Zheng, Sholto Douglas, Siddhartha Brahma, Tatiana López, Victor Gomes, Vighnesh Birodkar, Xin Chen, Yaroslav Ganin, Yi-Ling Wang, Yilin Ma, Yori Zwols, Yu Qiao, Yuchen Liang, Yusuf Aytar and Zu Kim for their invaluable partnership in developing and refining key components of this project.

Special thanks to Douglas Eck, Oriol Vinyals, Eli Collins, Koray Kavukcuoglu and Demis Hassabis for their insightful guidance and support throughout the research process.

We also acknowledge the many other individuals who contributed across Google DeepMind and our partners at Google.