SWE-bench Collection SWE-bench is a benchmark for evaluating Language Models and AI Systems on their ability resolve real world GitHub Issues. • 4 items • Updated Mar 8, 2025 • 8