Update README.md
Browse files
README.md
CHANGED
|
@@ -49,9 +49,6 @@ print(outputs[0].outputs[0].text)
|
|
| 49 |
|
| 50 |
# 📃Evaluation
|
| 51 |
|
| 52 |
-
LUFFY is evaluated on six competition-level benchmarks, achieving state-of-the-art results among all zero-RL methods. It surpasses both on-policy RL and imitation learning (SFT), especially in generalization:
|
| 53 |
-
|
| 54 |
-
## LUFFY on Qwen2.5-Instruct-7B
|
| 55 |
| **Model** | **AIME 2024** | **AIME 2025** | **AMC** | **MATH-500** | **Minerva** | **Olympiad** | **Avg.** |
|
| 56 |
|-----------------------------------|-------------|-------------|---------|---------------|-------------|---------------|----------|
|
| 57 |
| Qwen2.5-7B-Instruct | 11.9 | 7.6 | 44.1 | 74.6 | 30.5 | 39.7 | 34.7 |
|
|
|
|
| 49 |
|
| 50 |
# 📃Evaluation
|
| 51 |
|
|
|
|
|
|
|
|
|
|
| 52 |
| **Model** | **AIME 2024** | **AIME 2025** | **AMC** | **MATH-500** | **Minerva** | **Olympiad** | **Avg.** |
|
| 53 |
|-----------------------------------|-------------|-------------|---------|---------------|-------------|---------------|----------|
|
| 54 |
| Qwen2.5-7B-Instruct | 11.9 | 7.6 | 44.1 | 74.6 | 30.5 | 39.7 | 34.7 |
|