X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance Paper • 2303.15764 • Published Mar 28, 2023 • 2
JM3D & JM3D-LLM: Elevating 3D Representation with Joint Multi-modal Cues Paper • 2310.09503 • Published Oct 14, 2023 • 1
X-Oscar: A Progressive Framework for High-quality Text-guided 3D Animatable Avatar Generation Paper • 2405.00954 • Published May 2, 2024
TraDiffusion: Trajectory-Based Training-Free Image Generation Paper • 2408.09739 • Published Aug 19, 2024 • 9
Multi-branch Collaborative Learning Network for 3D Visual Grounding Paper • 2407.05363 • Published Jul 7, 2024
Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension Paper • 2411.13093 • Published Nov 20, 2024 • 2
QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension Paper • 2503.08689 • Published Mar 11, 2025 • 4
JavisDiT: Joint Audio-Video Diffusion Transformer with Hierarchical Spatio-Temporal Prior Synchronization Paper • 2503.23377 • Published Mar 30, 2025 • 57
ControlMLLM: Training-Free Visual Prompt Learning for Multimodal Large Language Models Paper • 2407.21534 • Published Jul 31, 2024 • 5
Zooming In on Fakes: A Novel Dataset for Localized AI-Generated Image Detection with Forgery Amplification Approach Paper • 2504.11922 • Published Apr 16, 2025 • 1
Towards Semantic Equivalence of Tokenization in Multimodal LLM Paper • 2406.05127 • Published Jun 7, 2024
RePrompt: Reasoning-Augmented Reprompting for Text-to-Image Generation via Reinforcement Learning Paper • 2505.17540 • Published May 23, 2025 • 7
SpaCE-10: A Comprehensive Benchmark for Multimodal Large Language Models in Compositional Spatial Intelligence Paper • 2506.07966 • Published Jun 9, 2025 • 1
AIGI-Holmes: Towards Explainable and Generalizable AI-Generated Image Detection via Multimodal Large Language Models Paper • 2507.02664 • Published Jul 3, 2025 • 4
Training-Free Multimodal Large Language Model Orchestration Paper • 2508.10016 • Published Aug 6, 2025
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation Paper • 2503.17671 • Published Mar 22, 2025
MetaCaptioner: Towards Generalist Visual Captioning with Open-source Suites Paper • 2510.12126 • Published Oct 14, 2025 • 1
A Survey on 3D Gaussian Splatting Applications: Segmentation, Editing, and Generation Paper • 2508.09977 • Published Aug 13, 2025
JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation Paper • 2512.22905 • Published 18 days ago • 18