wangrongsheng

wangrongsheng

AI & ML interests

None yet

Recent Activity

liked a model 7 days ago
alibaba-pai/Wan2.1-Fun-1.3B-Control
published a model 15 days ago
wangrongsheng/Med-R1
liked a model 17 days ago
ds4sd/SmolDocling-256M-preview
View all activity

Organizations

NatureAI's profile picture QiYuan-tech's profile picture PandaVT's profile picture pandalla's profile picture

wangrongsheng's activity

reacted to KaiChen1998's post with šŸ‘ 19 days ago
view post
Post
4809
šŸ“¢ Our EMOVA paper has been accepted by CVPR 2025, and we are glad to release all resources, including code (training & inference), datasets (training & evaluation), and checkpoints (EMOVA-3B/7B/72B)!

šŸ¤— EMOVA is a novel end-to-end omni-modal LLM that can see, hear and speak. Given omni-modal (i.e., textual, visual and speech) inputs, EMOVA can generate both textual and speech responses with vivid emotional controls by utilizing the speech decoder and a style controller.

āœØ EMOVA Highlights
āœ… State-of-the-art omni-modality: EMOVA achieves SoTA comparable results on both vision-language and speech benchmarks simultaneously.
āœ… Device adaptation: our codebase supports training/inference on both NVIDIA GPUs (e.g., A800 & H20) and Ascend NPUs (e.g., 910B3)!
āœ… Modular design: we integrate multiple implementations of vision encoder, vision projector, and language model, even including the most recent DeepSeekMoE-tiny!

šŸ”„ You are all welcome to try and star!
- Project page: https://emova-ollm.github.io/
- Github: https://github.com/emova-ollm/EMOVA
- Demo: Emova-ollm/EMOVA-demo