LI
RogerZhuo
AI & ML interests
None yet
Recent Activity
liked
a Space about 1 month ago
Qwen/Qwen3-TTS liked
a dataset about 1 month ago
MiniMaxAI/OctoCodingBench liked
a model 2 months ago
microsoft/TRELLIS.2-4B Organizations
Reading
Music
-
ElectricAlexis/NotaGen
Updated • 150 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 26 - Running on ZeroFeatured678
Di♪♪Rhythm
🎶678Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
AI Arena
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 13k • • 433 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 140 • • 349
LLM
基础大模型相关
must-read-papers
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 170
OCR
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 705k • • 12.3k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 276k • 389 - Running on L4Featured283
Thera Arbitrary-Scale Super-Resolution
🔥283Upscale images to any size with high‑quality super‑resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 55 • 327
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 10.1k • 428 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 7 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 340 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 28 - Running on Zero63
TryOffDiff
🔥63Extract garment images from everyday images!
Data
must-read-papers
Reading
AI Papers
-
Reinforcement Learning: An Overview
Paper • 2412.05265 • Published • 8 -
Fish-Speech: Leveraging Large Language Models for Advanced Multilingual Text-to-Speech Synthesis
Paper • 2411.01156 • Published • 13 -
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness
Paper • 2503.21755 • Published • 33 -
Qwen2.5-Omni Technical Report
Paper • 2503.20215 • Published • 170
Music
-
ElectricAlexis/NotaGen
Updated • 150 -
ASLP-lab/LLaSE-G1
Audio-to-Audio • Updated • 26 - Running on ZeroFeatured678
Di♪♪Rhythm
🎶678Blazingly Fast and Embarrassingly Simple Song Generation
-
DiffRhythm: Blazingly Fast and Embarrassingly Simple End-to-End Full-Length Song Generation with Latent Diffusion
Paper • 2503.01183 • Published • 29
OCR
AI Arena
images
images
-
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 705k • • 12.3k -
cagliostrolab/animagine-xl-4.0
Text-to-Image • Updated • 276k • 389 - Running on L4Featured283
Thera Arbitrary-Scale Super-Resolution
🔥283Upscale images to any size with high‑quality super‑resolution
-
stepfun-ai/Step1X-Edit
Image-to-Image • Updated • 55 • 327
I2V
image-to-video
-
Wan-AI/Wan2.1-T2V-1.3B
Text-to-Video • Updated • 13k • • 433 -
VBench: Comprehensive Benchmark Suite for Video Generative Models
Paper • 2311.17982 • Published • 9 -
VBench++: Comprehensive and Versatile Benchmark Suite for Video Generative Models
Paper • 2411.13503 • Published • 34 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 140 • • 349
TTS
语音相关
-
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Paper • 2307.16430 • Published • 4 -
Zyphra/Zonos-v0.1-transformer
Text-to-Speech • Updated • 10.1k • 428 -
IndexTTS: An Industrial-Level Controllable and Efficient Zero-Shot Text-To-Speech System
Paper • 2502.05512 • Published • 7 -
Step-Audio: Unified Understanding and Generation in Intelligent Speech Interaction
Paper • 2502.11946 • Published • 3
LLM
基础大模型相关
virtual try-on
虚拟换妆
-
Learning Flow Fields in Attention for Controllable Person Image Generation
Paper • 2412.08486 • Published • 36 -
franciszzj/Leffa
Image-to-Image • Updated • 340 -
TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models
Paper • 2411.18350 • Published • 28 - Running on Zero63
TryOffDiff
🔥63Extract garment images from everyday images!