-
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Paper • 2310.03714 • Published • 34 -
ReST meets ReAct: Self-Improvement for Multi-Step Reasoning LLM Agent
Paper • 2312.10003 • Published • 40 -
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation Framework
Paper • 2308.08155 • Published • 7 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 195
Collections
Discover the best community collections!
Collections including paper arxiv:2412.04454
-
End-to-End Goal-Driven Web Navigation
Paper • 1602.02261 • Published -
Learning Language Games through Interaction
Paper • 1606.02447 • Published -
Naturalizing a Programming Language via Interactive Learning
Paper • 1704.06956 • Published -
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Paper • 1802.08802 • Published
-
LoRA+: Efficient Low Rank Adaptation of Large Models
Paper • 2402.12354 • Published • 6 -
The FinBen: An Holistic Financial Benchmark for Large Language Models
Paper • 2402.12659 • Published • 21 -
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization
Paper • 2402.13249 • Published • 13 -
TrustLLM: Trustworthiness in Large Language Models
Paper • 2401.05561 • Published • 69
-
Set-of-Mark Prompting Unleashes Extraordinary Visual Grounding in GPT-4V
Paper • 2310.11441 • Published • 28 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 54 -
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Paper • 2406.08451 • Published • 25 -
GUI-WORLD: A Dataset for GUI-oriented Multimodal LLM-based Agents
Paper • 2406.10819 • Published • 1
-
xlangai/Aguvis-7B-720P
Updated • 11.1k • 6 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 64 -
SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents
Paper • 2401.10935 • Published • 4 -
cckevinn/SeeClick
Text Generation • Updated • 4.63k • 16
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 81 -
OmniParser for Pure Vision Based GUI Agent
Paper • 2408.00203 • Published • 25 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 64 -
THUDM/cogagent-9b-20241220
Image-Text-to-Text • Updated • 1.62k • 52
-
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Paper • 2411.17465 • Published • 81 -
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 64 -
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials
Paper • 2412.09605 • Published • 28
-
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction
Paper • 2412.04454 • Published • 64 -
Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Paper • 2410.05243 • Published • 19 -
UI-TARS: Pioneering Automated GUI Interaction with Native Agents
Paper • 2501.12326 • Published • 54