mrfakename norris commited on
Commit
283da2d
·
verified ·
0 Parent(s):

Duplicate from MeiGen-AI/MeiGen-MultiTalk

Browse files

Co-authored-by: F <[email protected]>

.gitattributes ADDED
@@ -0,0 +1,38 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/logo.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/logo2.jpeg filter=lfs diff=lfs merge=lfs -text
38
+ assets/pipe.png filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,88 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ language:
4
+ - en
5
+ - zh
6
+ tags:
7
+ - video generation
8
+ - conversational video generation
9
+ - talking human video generation
10
+ pipeline_tag: image-to-video
11
+ ---
12
+
13
+ <p align="center">
14
+ <img src="assets/logo2.jpeg" alt="MultiTalk" width="300"/>
15
+ </p>
16
+
17
+ # MeiGen-MultiTalk • Audio-Driven Multi-Person Conversational Video Generation
18
+
19
+
20
+
21
+ <p align="left">
22
+ <a href="https://meigen-ai.github.io/multi-talk/">
23
+ <img
24
+ src="https://img.shields.io/badge/MultiTalk-Website-0A66C2?logo=safari&logoColor=white" style="display: inline-block; vertical-align: middle;"
25
+ alt="MultiTalk Website"
26
+ />
27
+ </a>
28
+ <a href="https://arxiv.org/abs/2505.22647">
29
+ <img
30
+ src="https://img.shields.io/badge/MultiTalk-Paper-red?logo=arxiv&logoColor=red" style="display: inline-block; vertical-align: middle;"
31
+ alt="MultiTalk Paper on arXiv"
32
+ />
33
+ </a>
34
+ <a href="https://github.com/MeiGen-AI/MultiTalk" target="_blank" style="margin: 2px;">
35
+ <img
36
+ alt="Github" src="https://img.shields.io/badge/MultiTalk-Codebase-536af5?color=536af5&logo=github" style="display: inline-block; vertical-align: middle;"
37
+ alt="MultiTalk Codebase"
38
+ />
39
+ </a>
40
+
41
+ </p>
42
+
43
+ > We present **MultiTalk**, an open-source audio-driven multi-person conversational video generation model with the state-of-the-art lip synchronization accuracy.
44
+ > ​​Key features:​​
45
+ > - 💬 ​​Realistic Conversations​​ - Supports single & multi-person generation
46
+ > - 👥 ​​Interactive Character Control​​ - Direct virtual humans via prompts
47
+ > - 🎤 ​​Generalization Performances​​ - Supports the generation of cartoon character and singing
48
+ > - 📺 ​​Resolution Flexibility​​: 480p & 720p output at arbitrary aspect ratios
49
+ > - ⏱️ **Long Video Generation**: Support video generation up to 15 seconds
50
+
51
+
52
+ This repository hosts the model weights for **MultiTalk**. For installation, usage instructions, and further documentation, please visit our [GitHub repository](https://github.com/MeiGen-AI/MultiTalk).
53
+
54
+
55
+
56
+
57
+
58
+
59
+ ## Method
60
+ We propose a novel framework, MultiTalk, for audio-driven multi-person conversational video generation. We investigate several schemes for audio injection and introduce
61
+ the Label Rotary Position Embedding (L-RoPE) method. By assigning identical labels to audio embeddings and video latents, it effectively activates specific regions within the audio cross-attention
62
+ map, thereby resolving incorrect binding issues. To localize the region of the specified person, we introduce the adaptive person localization by computing the similarity
63
+ between the features of the given region of a person in the reference image and all the features of the whole video.
64
+
65
+ <p align="left"><img src="assets/pipe.png" width="80%"></p>
66
+
67
+
68
+
69
+
70
+
71
+ ## Citation
72
+ If you find our work helpful, please cite us.
73
+
74
+ ```
75
+ @article{kong2025let,
76
+ title={Let Them Talk: Audio-Driven Multi-Person Conversational Video Generation},
77
+ author={Kong, Zhe and Gao, Feng and Zhang, Yong and Kang, Zhuoliang and Wei, Xiaoming and Cai, Xunliang and Chen, Guanying and Luo, Wenhan},
78
+ journal={arXiv preprint arXiv:2505.22647},
79
+ year={2025}
80
+ }
81
+ ```
82
+
83
+
84
+
85
+ ## License Agreement
86
+ The models in this repository are licensed under the Apache 2.0 License. We claim no rights over the your generated contents, granting you the freedom to use them while ensuring that your usage complies with the provisions of this license. You are fully accountable for your use of the models, which must not involve sharing any content that violates applicable laws, causes harm to individuals or groups, disseminates personal information intended for harm, spreads misinformation, or targets vulnerable populations.
87
+
88
+
assets/logo.png ADDED

Git LFS Details

  • SHA256: 2fb97620f1515b94de007f5b5cde23e51aaa84a5cdc1eb91c021bb46b4cae3f0
  • Pointer size: 132 Bytes
  • Size of remote file: 3.31 MB
assets/logo2.jpeg ADDED

Git LFS Details

  • SHA256: 984efa12db10f378f37ba0576be90517658ed5c4a4146f2483121e9ae8fbd800
  • Pointer size: 131 Bytes
  • Size of remote file: 446 kB
assets/pipe.png ADDED

Git LFS Details

  • SHA256: dca19575d5c512b93d0eab2359cc75878da2064d4ef0e1f44aaf6accc04d6e0a
  • Pointer size: 132 Bytes
  • Size of remote file: 1.18 MB
diffusion_pytorch_model.safetensors.index.json ADDED
The diff for this file is too large to render. See raw diff
 
multitalk.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f4b48e2eb148e2407711dfc29ef411820094e5684435d5791a6d34b53fe9e1db
3
+ size 9947889040