LocAgent: Graph-Guided LLM Agents for Code Localization Paper • 2503.09089 • Published 1 day ago • 5
LocAgent: Graph-Guided LLM Agents for Code Localization Paper • 2503.09089 • Published 1 day ago • 5 • 1
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published 3 days ago • 13
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published 3 days ago • 13 • 1
MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents Paper • 2503.01935 • Published 11 days ago • 24
MMVU: Measuring Expert-Level Multi-Discipline Video Understanding Paper • 2501.12380 • Published Jan 21 • 84
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 71
OpenDevin: An Open Platform for AI Software Developers as Generalist Agents Paper • 2407.16741 • Published Jul 23, 2024 • 71
StarCoder 2 and The Stack v2: The Next Generation Paper • 2402.19173 • Published Feb 29, 2024 • 138
ChatCell: Facilitating Single-Cell Analysis with Natural Language Paper • 2402.08303 • Published Feb 13, 2024 • 13
Prioritizing Safeguarding Over Autonomy: Risks of LLM Agents for Science Paper • 2402.04247 • Published Feb 6, 2024 • 2
Igniting Language Intelligence: The Hitchhiker's Guide From Chain-of-Thought Reasoning to Language Agents Paper • 2311.11797 • Published Nov 20, 2023 • 2
QTSumm: A New Benchmark for Query-Focused Table Summarization Paper • 2305.14303 • Published May 23, 2023
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity Paper • 2310.07521 • Published Oct 11, 2023
Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data? Paper • 2309.08963 • Published Sep 16, 2023 • 11
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs Paper • 2307.16789 • Published Jul 31, 2023 • 100
Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers Paper • 2305.14987 • Published May 24, 2023 • 1
RWKV: Reinventing RNNs for the Transformer Era Paper • 2305.13048 • Published May 22, 2023 • 17
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts Paper • 2202.01279 • Published Feb 2, 2022
Crosslingual Generalization through Multitask Finetuning Paper • 2211.01786 • Published Nov 3, 2022 • 2