Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2502.14282

PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published 3 days ago • 14

about 3 hours ago

SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 64
SelfCite: Self-Supervised Alignment for Context Attribution in Large Language Models

Paper • 2502.09604 • Published 9 days ago • 30
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models

Paper • 2502.10458 • Published 11 days ago • 27
PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC

Paper • 2502.14282 • Published 3 days ago • 14

Agents Thinking Fast and Slow: A Talker-Reasoner Architecture

Paper • 2410.08328 • Published Oct 10, 2024
SwiftSage: A Generative Agent with Fast and Slow Thinking for Complex Interactive Tasks

Paper • 2305.17390 • Published May 27, 2023 • 2
SRMT: Shared Memory for Multi-agent Lifelong Pathfinding

Paper • 2501.13200 • Published Jan 22 • 64
Talk Structurally, Act Hierarchically: A Collaborative Framework for LLM Multi-Agent Systems

Paper • 2502.11098 • Published 7 days ago • 10

about 19 hours ago

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Paper • 2412.14161 • Published Dec 18, 2024 • 51
Training Software Engineering Agents and Verifiers with SWE-Gym

Paper • 2412.21139 • Published Dec 30, 2024 • 22
OS-Genesis: Automating GUI Agent Trajectory Construction via Reverse Task Synthesis

Paper • 2412.19723 • Published Dec 27, 2024 • 82
AgentGen: Enhancing Planning Abilities for Large Language Model based Agent via Environment and Task Generation

Paper • 2408.00764 • Published Aug 1, 2024 • 1

Vision Language Models

about 19 hours ago

BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
TextSquare: Scaling up Text-Centric Visual Instruction Tuning

Paper • 2404.12803 • Published Apr 19, 2024 • 30
Groma: Localized Visual Tokenization for Grounding Multimodal Large Language Models

Paper • 2404.13013 • Published Apr 19, 2024 • 31
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD

Paper • 2404.06512 • Published Apr 9, 2024 • 30

about 19 hours ago

GAIA: a benchmark for General AI Assistants

Paper • 2311.12983 • Published Nov 21, 2023 • 192
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI

Paper • 2311.16502 • Published Nov 27, 2023 • 35
BLINK: Multimodal Large Language Models Can See but Not Perceive

Paper • 2404.12390 • Published Apr 18, 2024 • 26
RULER: What's the Real Context Size of Your Long-Context Language Models?

Paper • 2404.06654 • Published Apr 9, 2024 • 35

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs