MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

Muyao Niu^1,2, Xiaodong Cun^2,*, Xintao Wang², Yong Zhang², Ying Shan², Yinqiang Zheng^1,*

¹The University of Tokyo

²Tencent AI Lab

^*Corresponding Author

European Conference on Computer Vision (ECCV) 2024

TL;DR: Image 🏞️ + Hybrid Controls 🕹️ = Videos 🎬🍿

MOFA-Video animates a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.

Image & Control

Output

Overview

We introduce MOFA-Video, a method designed to adapt motions from different domains to the frozen Video Diffusion Model. By employing sparse-to-dense (S2D) motion generation and flow-based motion adaptation, MOFA-Video can effectively animate a single image using various types of control signals, including trajectories, keypoint sequences, AND their combinations.

During the training stage, we generate sparse control signals through sparse motion sampling and then train different MOFA-Adapters to generate video via pre-trained SVD. During the inference stage, different MOFA-Adapters can be combined to jointly control the frozen SVD.

Trajectory-based Image Animation

Image

Trajectories

Flow

Output

Click Here to See More Results! 😈😈😈

Keypoint-based Facial Image Animation

1. Keypoints from driven-video (click to play)

2. Keypoints from driven-audio (click to play)

Click Here to See More Results! 😈😈😈

Zero-Shot Functionalities

1. Hybrid Control

Image & Controls

Landmarks

Output

2. Motion Brush

Image

Trajectories & Brush

Flow

Output

3. Control Scale

Image & Trajectories

Flow

Scale=0 (Pure SVD)

Scale=0.3

Scale=0.6 (Default)

Scale=1

4. Direct Control via Optical Flow

Image

Flow

Output

Flow

Output

Ablation Studies

1. Architecture of MOFA-Adapter

Image

Trajectories

w/o warping

w/o tuning

w/o S2D

Ours

2. Domain-specific Tuning (click to play)

Image

Flow

Landmarks

Output w/ Tuning

Output w/o Tuning

BibTeX

@article{niu2024mofa,
        title={MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model},
        author={Niu, Muyao and Cun, Xiaodong and Wang, Xintao and Zhang, Yong and Shan, Ying and Zheng, Yinqiang},
        journal={arXiv preprint arXiv:2405.20222},
        year={2024}
      }
}

MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model

1 The University of Tokyo

2 Tencent AI Lab

* Corresponding Author

European Conference on Computer Vision (ECCV) 2024

TL;DR: Image 🏞️ + Hybrid Controls 🕹️ = Videos 🎬🍿

Overview

Trajectory-based Image Animation

Keypoint-based Facial Image Animation

1. Keypoints from driven-video (click to play)

2. Keypoints from driven-audio (click to play)

Zero-Shot Functionalities

1. Hybrid Control

2. Motion Brush

3. Control Scale

4. Direct Control via Optical Flow

Ablation Studies

1. Architecture of MOFA-Adapter

2. Domain-specific Tuning (click to play)

BibTeX

¹The University of Tokyo

²Tencent AI Lab

^*Corresponding Author