Google researchers have found a novel way to turn a single human photo into an AI-generated video.

General
Google researchers have found a novel way to turn a single human photo into an AI-generated video.

Google researchers have discovered a way to create a video version of a human being generated from a single still image. This makes it possible to generate a video of someone speaking from input text or to change a person's mouth movements to match an audio track in a different language than the one originally spoken. It also feels like a slant toward identity theft and misinformation, but what is AI without the prospect of horrific consequences?

The technology itself is quite interesting, and the Google researchers who published the paper call it Vlogger. In it, the authors (Enric Corona et al.) provide various examples of how AI can take input images of humans (in this case, I assume, mostly AI-generated humans), generate facial and body movements with audio files, and match them.

This is just one example of the potential uses of this technology. Another is video editing, particularly of the subject's facial expressions. As an example, the researchers show different versions of the same video. One example is researchers showing different versions of the same video: a presenter talking to the camera, a presenter with his mouth eerily closed, a presenter with his eyes closed, and so on. My favorite is a video of a presenter who has been artificially made to open his eyes by an AI, and he doesn't even blink. The serial killer vibe is amazing. Thanks, AI.

The most useful feature, in my opinion, is the ability to replace the audio track of a foreign language dubbed version of a video and have AI lip-sync the person's facial movements to the audio track.

This works through two stages: "1) a probabilistic human-to-3D motion diffusion model, and 2) a new diffusion-based architecture that augments the text-to-image model with temporal and spatial control. This approach enables the generation of variable-length, high-quality video that can be easily controlled by high-level representations of human faces and bodies," the GitHub page states.

To be sure, this technology is not perfect. The mouth movements in the example have some properties common to AI-generated video content, which can be quite creepy at times, as pointed out by a user responding to a thread on this technology by EyeingAI on X. But Vlogger doesn't have to fool everyone or even anyone. Likewise, a more perfect technology would be even more worrisome considering how this technology could be used to create deep fakes, spread misinformation, steal identities, etc. One day that day will come. I hope that by then we will have a few more established ways of dealing with this stuff.

Categories