Update README.md (#2)

Files changed (1) hide show

README.md CHANGED Viewed

@@ -4,15 +4,15 @@ library_name: transformers
 tags: []
 ---
-# R1-AQA
 <!-- Provide a quick summary of what the model is/does. -->
 ## Introduction
 R1-AQA is a audio question answering (AQA) model based on `Qwen2-Audio-7B-Instruct`, optimized through reinforcement learning using the group relative policy optimization (GRPO) algorithm.
-This implementation achieved state-of-the-art performance on MMAU *test-mini* benchmark with only 38k post-training samples.
-For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Report]().
 ### Table: Accuracies (%) on MMAU Test-mini benchmark
 | Model                                      | Method                  | Sound  | Music  | Speech | Average |
@@ -29,7 +29,7 @@ For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa)
 | SALMONN                                    | Direct Inference\*      | 41.00  | 34.80  | 25.50  | 33.70   |
 | Qwen2-Audio-7B-Instruct                    | CoTA \[1\]            | 60.06  | 64.30  | 60.70  | 61.71   |
 | Qwen2-Audio-7B-Instruct                    | Zero-Shot-CoT \[2\]   | 61.86  | 56.29  | 55.26  | 57.80   |
-| Qwen2-Audio-7B-Instruct                    | **GRPO (Ours)**         | **69.37** | 66.77  | 57.36  | **64.50** |
 #### Notes:
 \* The data are sourced from the MMAU official website: [https://sakshi113.github.io/mmau_homepage/](https://sakshi113.github.io/mmau_homepage/)

 tags: []
 ---
+# R1-AQA --- Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
 <!-- Provide a quick summary of what the model is/does. -->
 ## Introduction
 R1-AQA is a audio question answering (AQA) model based on `Qwen2-Audio-7B-Instruct`, optimized through reinforcement learning using the group relative policy optimization (GRPO) algorithm.
+This implementation has achieved state-of-the-art performance on MMAU *Test-mini* benchmark with only 38k post-training samples.
+For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Technical Report](https://arxiv.org/abs/2503.11197).
 ### Table: Accuracies (%) on MMAU Test-mini benchmark
 | Model                                      | Method                  | Sound  | Music  | Speech | Average |
 | SALMONN                                    | Direct Inference\*      | 41.00  | 34.80  | 25.50  | 33.70   |
 | Qwen2-Audio-7B-Instruct                    | CoTA \[1\]            | 60.06  | 64.30  | 60.70  | 61.71   |
 | Qwen2-Audio-7B-Instruct                    | Zero-Shot-CoT \[2\]   | 61.86  | 56.29  | 55.26  | 57.80   |
+| **Qwen2-Audio-7B-Instruct**                | **GRPO (Ours)**         | **69.37** | 66.77  | 57.36  | **64.50** |
 #### Notes:
 \* The data are sourced from the MMAU official website: [https://sakshi113.github.io/mmau_homepage/](https://sakshi113.github.io/mmau_homepage/)