MCP-Bench: Benchmarking Tool-Using LLM Agents with Complex Real-World Tasks via MCP Servers Paper • 2508.20453 • Published 8 days ago • 56
view article Article Code Llama: Llama 2 learns to code By philschmid and 7 others • Aug 25, 2023 • 10
MLLM-as-a-Judge for Image Safety without Human Labeling Paper • 2501.00192 • Published Dec 31, 2024 • 32