InfiniteTalk — Audio‑Driven Video Generation for Sparse‑Frame Video Dubbing (Overview)
Download printable cheat-sheet (CC-BY 4.0)19 Aug 2025, 00:00 Z
TL;DR InfiniteTalk is an audio‑driven dubbing framework that can generate long talking videos with synchronized lips, head/body motion, and facial expressions. It works as video‑to‑video or image‑to‑video, includes acceleration (TeaCache) and quantization options, and exposes practical flags for controlling length, quality, and VRAM.
What is InfiniteTalk?
InfiniteTalk proposes a sparse‑frame video dubbing approach that goes beyond lip‑only edits. Given an input video (V2V) or a single image (I2V) plus an audio track, it synthesizes a new video with:
- Lip synchronization to the audio
- Coordinated head movements and body posture
- Facial expressions aligned to speech
- Identity preservation across long durations
Links:
- Paper (tech report): https://arxiv.org/abs/2508.14033
Highlights
- Sparse‑frame dubbing: edits lips, head, body, and expressions, not just lips
- Infinite‑length generation: streaming mode can produce long videos
- Stability vs. prior baselines: reduces hand/body distortions (per repo notes)
- Lip accuracy: improved sync compared to MultiTalk (qualitative claims in README)
- Modes: V2V (mimics original camera motion) and I2V (single image → video)
Notes: For very long clips, the README mentions potential color shift; suggested mitigations include SDEdit for short clips and simple image‑to‑video camera movement tricks for I2V.
Quick Start (inference)
Environment (abbrev.):
conda create -n infinitetalk python=3.10
conda activate infinitetalk
# PyTorch (CUDA build), flash‑attn, core deps
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
pip install flash_attn==2.7.4.post1
pip install -r requirements.txt
conda install -c conda-forge ffmpeg librosaModels (from README table):
huggingface-cli download Wan-AI/Wan2.1-I2V-14B-480P --local-dir ./weights/Wan2.1-I2V-14B-480P
huggingface-cli download TencentGameMate/chinese-wav2vec2-base --local-dir ./weights/chinese-wav2vec2-base
huggingface-cli download MeiGen-AI/InfiniteTalk --local-dir ./weights/InfiniteTalkRun (single‑GPU, streaming mode):