Commit
·
283da2d
verified
·
0
Parent(s):
Duplicate from MeiGen-AI/MeiGen-MultiTalk
Browse filesCo-authored-by: F <[email protected]>
- .gitattributes +38 -0
- README.md +88 -0
- assets/logo.png +3 -0
- assets/logo2.jpeg +3 -0
- assets/pipe.png +3 -0
- diffusion_pytorch_model.safetensors.index.json +0 -0
- multitalk.safetensors +3 -0
.gitattributes
ADDED
|
@@ -0,0 +1,38 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
*.7z filter=lfs diff=lfs merge=lfs -text
|
| 2 |
+
*.arrow filter=lfs diff=lfs merge=lfs -text
|
| 3 |
+
*.bin filter=lfs diff=lfs merge=lfs -text
|
| 4 |
+
*.bz2 filter=lfs diff=lfs merge=lfs -text
|
| 5 |
+
*.ckpt filter=lfs diff=lfs merge=lfs -text
|
| 6 |
+
*.ftz filter=lfs diff=lfs merge=lfs -text
|
| 7 |
+
*.gz filter=lfs diff=lfs merge=lfs -text
|
| 8 |
+
*.h5 filter=lfs diff=lfs merge=lfs -text
|
| 9 |
+
*.joblib filter=lfs diff=lfs merge=lfs -text
|
| 10 |
+
*.lfs.* filter=lfs diff=lfs merge=lfs -text
|
| 11 |
+
*.mlmodel filter=lfs diff=lfs merge=lfs -text
|
| 12 |
+
*.model filter=lfs diff=lfs merge=lfs -text
|
| 13 |
+
*.msgpack filter=lfs diff=lfs merge=lfs -text
|
| 14 |
+
*.npy filter=lfs diff=lfs merge=lfs -text
|
| 15 |
+
*.npz filter=lfs diff=lfs merge=lfs -text
|
| 16 |
+
*.onnx filter=lfs diff=lfs merge=lfs -text
|
| 17 |
+
*.ot filter=lfs diff=lfs merge=lfs -text
|
| 18 |
+
*.parquet filter=lfs diff=lfs merge=lfs -text
|
| 19 |
+
*.pb filter=lfs diff=lfs merge=lfs -text
|
| 20 |
+
*.pickle filter=lfs diff=lfs merge=lfs -text
|
| 21 |
+
*.pkl filter=lfs diff=lfs merge=lfs -text
|
| 22 |
+
*.pt filter=lfs diff=lfs merge=lfs -text
|
| 23 |
+
*.pth filter=lfs diff=lfs merge=lfs -text
|
| 24 |
+
*.rar filter=lfs diff=lfs merge=lfs -text
|
| 25 |
+
*.safetensors filter=lfs diff=lfs merge=lfs -text
|
| 26 |
+
saved_model/**/* filter=lfs diff=lfs merge=lfs -text
|
| 27 |
+
*.tar.* filter=lfs diff=lfs merge=lfs -text
|
| 28 |
+
*.tar filter=lfs diff=lfs merge=lfs -text
|
| 29 |
+
*.tflite filter=lfs diff=lfs merge=lfs -text
|
| 30 |
+
*.tgz filter=lfs diff=lfs merge=lfs -text
|
| 31 |
+
*.wasm filter=lfs diff=lfs merge=lfs -text
|
| 32 |
+
*.xz filter=lfs diff=lfs merge=lfs -text
|
| 33 |
+
*.zip filter=lfs diff=lfs merge=lfs -text
|
| 34 |
+
*.zst filter=lfs diff=lfs merge=lfs -text
|
| 35 |
+
*tfevents* filter=lfs diff=lfs merge=lfs -text
|
| 36 |
+
assets/logo.png filter=lfs diff=lfs merge=lfs -text
|
| 37 |
+
assets/logo2.jpeg filter=lfs diff=lfs merge=lfs -text
|
| 38 |
+
assets/pipe.png filter=lfs diff=lfs merge=lfs -text
|
README.md
ADDED
|
@@ -0,0 +1,88 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- en
|
| 5 |
+
- zh
|
| 6 |
+
tags:
|
| 7 |
+
- video generation
|
| 8 |
+
- conversational video generation
|
| 9 |
+
- talking human video generation
|
| 10 |
+
pipeline_tag: image-to-video
|
| 11 |
+
---
|
| 12 |
+
|
| 13 |
+
<p align="center">
|
| 14 |
+
<img src="assets/logo2.jpeg" alt="MultiTalk" width="300"/>
|
| 15 |
+
</p>
|
| 16 |
+
|
| 17 |
+
# MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
|
| 18 |
+
|
| 19 |
+
|
| 20 |
+
|
| 21 |
+
<p align="left">
|
| 22 |
+
<a href="https://meigen-ai.github.io/multi-talk/">
|
| 23 |
+
<img
|
| 24 |
+
src="https://img.shields.io/badge/MultiTalk-Website-0A66C2?logo=safari&logoColor=white" style="display: inline-block; vertical-align: middle;"
|
| 25 |
+
alt="MultiTalk Website"
|
| 26 |
+
/>
|
| 27 |
+
</a>
|
| 28 |
+
<a href="https://arxiv.org/abs/2505.22647">
|
| 29 |
+
<img
|
| 30 |
+
src="https://img.shields.io/badge/MultiTalk-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
|
| 31 |
+
alt="MultiTalk Paper on arXiv"
|
| 32 |
+
/>
|
| 33 |
+
</a>
|
| 34 |
+
<a href="https://github.com/MeiGen-AI/MultiTalk" target="_blank" style="margin: 2px;">
|
| 35 |
+
<img
|
| 36 |
+
alt="Github" src="https://img.shields.io/badge/MultiTalk-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
|
| 37 |
+
alt="MultiTalk Codebase"
|
| 38 |
+
/>
|
| 39 |
+
</a>
|
| 40 |
+
|
| 41 |
+
</p>
|
| 42 |
+
|
| 43 |
+
> We present **MultiTalk**, an open-source audio-driven multi-person conversational video generation model with the state-of-the-art lip synchronization accuracy.
|
| 44 |
+
> Key features:
|
| 45 |
+
> - 💬 Realistic Conversations - Supports single & multi-person generation
|
| 46 |
+
> - 👥 Interactive Character Control - Direct virtual humans via prompts
|
| 47 |
+
> - 🎤 Generalization Performances - Supports the generation of cartoon character and singing
|
| 48 |
+
> - 📺 Resolution Flexibility: 480p & 720p output at arbitrary aspect ratios
|
| 49 |
+
> - ⏱️ **Long Video Generation**: Support video generation up to 15 seconds
|
| 50 |
+
|
| 51 |
+
|
| 52 |
+
This repository hosts the model weights for **MultiTalk**. For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/MeiGen-AI/MultiTalk).
|
| 53 |
+
|
| 54 |
+
|
| 55 |
+
|
| 56 |
+
|
| 57 |
+
|
| 58 |
+
|
| 59 |
+
## Method
|
| 60 |
+
We propose a novel framework, MultiTalk, for audio-driven multi-person conversational video generation. We investigate several schemes for audio injection and introduce
|
| 61 |
+
the Label Rotary Position Embedding (L-RoPE) method. By assigning identical labels to audio embeddings and video latents, it effectively activates specific regions within the audio cross-attention
|
| 62 |
+
map, thereby resolving incorrect binding issues. To localize the region of the specified person, we introduce the adaptive person localization by computing the similarity
|
| 63 |
+
between the features of the given region of a person in the reference image and all the features of the whole video.
|
| 64 |
+
|
| 65 |
+
<p align="left"><img src="assets/pipe.png" width="80%"></p>
|
| 66 |
+
|
| 67 |
+
|
| 68 |
+
|
| 69 |
+
|
| 70 |
+
|
| 71 |
+
## Citation
|
| 72 |
+
If you find our work helpful, please cite us.
|
| 73 |
+
|
| 74 |
+
```
|
| 75 |
+
@article{kong2025let,
|
| 76 |
+
title={Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation},
|
| 77 |
+
author={Kong, Zhe and Gao, Feng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Cai, Xunliang and Chen, Guanying and Luo, Wenhan},
|
| 78 |
+
journal={arXiv preprint arXiv:2505.22647},
|
| 79 |
+
year={2025}
|
| 80 |
+
}
|
| 81 |
+
```
|
| 82 |
+
|
| 83 |
+
|
| 84 |
+
|
| 85 |
+
## License Agreement
|
| 86 |
+
The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations.
|
| 87 |
+
|
| 88 |
+
|
assets/logo.png
ADDED
|
Git LFS Details
|
assets/logo2.jpeg
ADDED
|
Git LFS Details
|
assets/pipe.png
ADDED
|
Git LFS Details
|
diffusion_pytorch_model.safetensors.index.json
ADDED
|
The diff for this file is too large to render.
See raw diff
|
|
|
multitalk.safetensors
ADDED
|
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
version https://git-lfs.github.com/spec/v1
|
| 2 |
+
oid sha256:f4b48e2eb148e2407711dfc29ef411820094e5684435d5791a6d34b53fe9e1db
|
| 3 |
+
size 9947889040
|