Google Research recently released Dreamix, an AI model that will be able to perform text-based motion and appearance editing of general videos. 

While there have been several image-editing tools released based on diffusion models, Dreamix is the first diffusion-based method for video editing. 

The model essentially functions in three ways: 

(i) With only a video and a text prompt, the model is capable of editing videos while preserving fidelity to color, posture, object size, and camera pose, ensuring that the resulting video is temporally consistent. 

(ii) Similarly, for a small collection of images with the same subject given as input, the model can generate new videos with the subject in motion. 

(iii) Additionally, with only an image and a text input, the model will be able to create videos while preserving the visual fidelity to object location and background. 

To ensure high quality output and fidelity to the input image and prompt, the team will be to enable a text-conditioned video diffusion model (VDM) with two main ideas: first, rather than using pure-noise as initialisation of the model, they used a degraded version of the original video by downscaling it and adding noise, keeping only low spatio-temporal information; and second, they further improved the fidelity to the original video by introducing a mixed finetuning approach, wherein the VDMs are also finetuned on the collection of individual frames of the input video while discarding their temporal order by masing the temporal attention.  

Additionally, the researchers also proposed a new framework for image animation for applications that include animating the objects and background in an image, creating dynamic camera motion, etc. This is done by employing simple image processing operations, such as frame replication or geometric image transformation, to generate a coarse video. The video is then passed through the Dreamix video editor for editing. 

On comparing this method against the state-of-the-art baselines, their model showed superior results on factors like quality, fidelity, and alignment. 

The post Google Introduces Dreamix, A Text-Conditioned Video Editing Model appeared first on Analytics India Magazine.