VITA-1.5: Towards GPT-4o Level Real-Time Vision and Speech Interaction Paper • 2501.01957 • Published 27 days ago • 42
Explanatory Instructions: Towards Unified Vision Tasks Understanding and Zero-shot Generalization Paper • 2412.18525 • Published Dec 24, 2024 • 72
Test-Time Preference Optimization: On-the-Fly Alignment via Iterative Textual Feedback Paper • 2501.12895 • Published 9 days ago • 51
EchoVideo: Identity-Preserving Human Video Generation by Multimodal Feature Fusion Paper • 2501.13452 • Published 8 days ago • 7
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback Paper • 2501.10799 • Published 13 days ago • 14
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding Paper • 2501.13200 • Published 8 days ago • 61
GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing Paper • 2501.13925 • Published 7 days ago • 5
AdaIR: Adaptive All-in-One Image Restoration via Frequency Mining and Modulation Paper • 2403.14614 • Published Mar 21, 2024 • 3
RealCritic: Towards Effectiveness-Driven Evaluation of Language Model Critiques Paper • 2501.14492 • Published 7 days ago • 29
Building Foundations for Natural Language Processing of Historical Turkish: Resources and Models Paper • 2501.04828 • Published 22 days ago • 11
Fine-tuning Large Language Models with Human-inspired Learning Strategies in Medical Question Answering Paper • 2408.07888 • Published Aug 15, 2024 • 13
OpenResearcher: Unleashing AI for Accelerated Scientific Research Paper • 2408.06941 • Published Aug 13, 2024 • 32
Diversity Empowers Intelligence: Integrating Expertise of Software Engineering Agents Paper • 2408.07060 • Published Aug 13, 2024 • 42
mPLUG-Owl3: Towards Long Image-Sequence Understanding in Multi-Modal Large Language Models Paper • 2408.04840 • Published Aug 9, 2024 • 34
The Lessons of Developing Process Reward Models in Mathematical Reasoning Paper • 2501.07301 • Published 18 days ago • 89