Collections
Discover the best community collections!
Collections including paper arxiv:2502.14282
-
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper • 2501.13200 • Published • 64 -
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models
Paper • 2502.09604 • Published • 30 -
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models
Paper • 2502.10458 • Published • 27 -
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
Paper • 2502.14282 • Published • 14
-
Agents Thinking Fast and Slow: A Talker-Reasoner Architecture
Paper • 2410.08328 • Published -
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks
Paper • 2305.17390 • Published • 2 -
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding
Paper • 2501.13200 • Published • 64 -
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems
Paper • 2502.11098 • Published • 10
-
TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks
Paper • 2412.14161 • Published • 51 -
Training Software Engineering Agents and Verifiers with SWE-Gym
Paper • 2412.21139 • Published • 22 -
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis
Paper • 2412.19723 • Published • 82 -
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation
Paper • 2408.00764 • Published • 1
-
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
TextSquare: Scaling up Text-Centric Visual Instruction Tuning
Paper • 2404.12803 • Published • 30 -
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models
Paper • 2404.13013 • Published • 31 -
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Paper • 2404.06512 • Published • 30
-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Paper • 2311.16502 • Published • 35 -
BLINK: Multimodal Large Language Models Can See but Not Perceive
Paper • 2404.12390 • Published • 26 -
RULER: What's the Real Context Size of Your Long-Context Language Models?
Paper • 2404.06654 • Published • 35