HPAI-BSC
/

Qwen2.5-72B-Instruct-Egida-DPO

Model card Files Files and versions Community

danihinjos commited on 22 days ago

Commit

ee28555

·

verified ·

1 Parent(s): c73e0b2

Update README.md

Files changed (1) hide show

README.md +21 -2

README.md CHANGED Viewed

@@ -1,5 +1,13 @@
 ---
 license: apache-2.0
 ---
 ## Model Description
@@ -31,14 +39,25 @@ dataset for this model. This results in a DPO dataset composed by triplets < ”
 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Qwen-2.5-72B-Instruct        |     0.235      |  0.051   |    0.329     |    0.050    |
-| Qwen-2.5-72B-Egida-DPO       |     0.125      |  0.042   |    0.210     |    0.019    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Qwen-2.5-72B-Instruct        |         0.618         |      0.771      |
-| Qwen-2.5-72B-Egida-DPO       |         0.620         |      0.768      |
 ## Environmental Impact

 ---
 license: apache-2.0
+datasets:
+- HPAI-BSC/Egida
+language:
+- en
+base_model:
+- Qwen/Qwen2.5-72B-Instruct
+tags:
+- safety
 ---
 ## Model Description
 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Qwen-2.5-72B-Instruct        |     0.235      |  0.051   |    0.329     |    0.050    |
+| Qwen-2.5-72B-Instruct-Egida-DPO       |     0.125      |  0.042   |    0.210     |    0.019    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Qwen-2.5-72B-Instruct        |         0.618         |      0.771      |
+| Qwen-2.5-72B-Instruct-Egida-DPO       |         0.620         |      0.768      |
+### Refusal Ratio
+|                              | OR Bench 80K (refusal) ↓ | OR Bench Hard (refusal) ↓ |
+|------------------------------|:---------------------:|:---------------:|
+| Qwen-2.5-7B-Instruct         |          0.015           |           0.102           |
+| Qwen-2.5-7B-Instruct-Egida-DPO        |          0.016           |           0.170           |
+Note that this refusal ratio is computed as keyword matching with a curated list of kewords. For more information, check the paper.
 ## Environmental Impact