-
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
ToolTalk: Evaluating Tool-Usage in a Conversational Setting
Paper • 2311.10775 • Published • 10 -
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Paper • 2311.11315 • Published • 8 -
An Embodied Generalist Agent in 3D World
Paper • 2311.12871 • Published • 8
Collections
Discover the best community collections!
Collections including paper arxiv:2402.01622
-
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
Paper • 2311.12022 • Published • 31 -
GAIA: a benchmark for General AI Assistants
Paper • 2311.12983 • Published • 192 -
gorilla-llm/APIBench
Updated • 156 • 66 -
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Paper • 2312.04724 • Published • 20
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
LASER: LLM Agent with State-Space Exploration for Web Navigation
Paper • 2309.08172 • Published • 13 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10
-
Branch-Solve-Merge Improves Large Language Model Evaluation and Generation
Paper • 2310.15123 • Published • 8 -
Diversity of Thought Improves Reasoning Abilities of Large Language Models
Paper • 2310.07088 • Published • 5 -
ToolChain*: Efficient Action Space Navigation in Large Language Models with A* Search
Paper • 2310.13227 • Published • 13 -
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 10
-
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Paper • 2309.04662 • Published • 23 -
Neurons in Large Language Models: Dead, N-gram, Positional
Paper • 2309.04827 • Published • 17 -
Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs
Paper • 2309.05516 • Published • 10 -
DrugChat: Towards Enabling ChatGPT-Like Capabilities on Drug Molecule Graphs
Paper • 2309.03907 • Published • 12