Audio-Text-to-Text
Transformers
Safetensors
qwen2_audio
text2text-generation
Inference Endpoints
franken GrantL10 commited on
Commit
5123f60
·
verified ·
1 Parent(s): c0e9ff4

Update README.md (#2)

Browse files

- Update README.md (2654f85152309f063ece1333ffab3a02a6ef7adf)


Co-authored-by: Grant Lee <[email protected]>

Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -4,15 +4,15 @@ library_name: transformers
4
  tags: []
5
  ---
6
 
7
- # R1-AQA
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
  ## Introduction
12
 
13
  R1-AQA is a audio question answering (AQA) model based on `Qwen2-Audio-7B-Instruct`, optimized through reinforcement learning using the group relative policy optimization (GRPO) algorithm.
14
- This implementation achieved state-of-the-art performance on MMAU *test-mini* benchmark with only 38k post-training samples.
15
- For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Report]().
16
 
17
  ### Table: Accuracies (%) on MMAU Test-mini benchmark
18
  | Model | Method | Sound | Music | Speech | Average |
@@ -29,7 +29,7 @@ For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa)
29
  | SALMONN | Direct Inference\* | 41.00 | 34.80 | 25.50 | 33.70 |
30
  | Qwen2-Audio-7B-Instruct | CoTA \[1\] | 60.06 | 64.30 | 60.70 | 61.71 |
31
  | Qwen2-Audio-7B-Instruct | Zero-Shot-CoT \[2\] | 61.86 | 56.29 | 55.26 | 57.80 |
32
- | Qwen2-Audio-7B-Instruct | **GRPO (Ours)** | **69.37** | 66.77 | 57.36 | **64.50** |
33
 
34
  #### Notes:
35
  \* The data are sourced from the MMAU official website: [https://sakshi113.github.io/mmau_homepage/](https://sakshi113.github.io/mmau_homepage/)
 
4
  tags: []
5
  ---
6
 
7
+ # R1-AQA --- Reinforcement Learning Outperforms Supervised Fine-Tuning: A Case Study on Audio Question Answering
8
 
9
  <!-- Provide a quick summary of what the model is/does. -->
10
 
11
  ## Introduction
12
 
13
  R1-AQA is a audio question answering (AQA) model based on `Qwen2-Audio-7B-Instruct`, optimized through reinforcement learning using the group relative policy optimization (GRPO) algorithm.
14
+ This implementation has achieved state-of-the-art performance on MMAU *Test-mini* benchmark with only 38k post-training samples.
15
+ For more details, please refer to our [Github](https://github.com/xiaomi/r1-aqa) and [Technical Report](https://arxiv.org/abs/2503.11197).
16
 
17
  ### Table: Accuracies (%) on MMAU Test-mini benchmark
18
  | Model | Method | Sound | Music | Speech | Average |
 
29
  | SALMONN | Direct Inference\* | 41.00 | 34.80 | 25.50 | 33.70 |
30
  | Qwen2-Audio-7B-Instruct | CoTA \[1\] | 60.06 | 64.30 | 60.70 | 61.71 |
31
  | Qwen2-Audio-7B-Instruct | Zero-Shot-CoT \[2\] | 61.86 | 56.29 | 55.26 | 57.80 |
32
+ | **Qwen2-Audio-7B-Instruct** | **GRPO (Ours)** | **69.37** | 66.77 | 57.36 | **64.50** |
33
 
34
  #### Notes:
35
  \* The data are sourced from the MMAU official website: [https://sakshi113.github.io/mmau_homepage/](https://sakshi113.github.io/mmau_homepage/)