HPAI-BSC
/

Meta-Llama-3.1-70B-Instruct-Egida-DPO

Model card Files Files and versions Community

danihinjos commited on 8 days ago

Commit

2550c5f

·

verified ·

1 Parent(s): e581917

Update README.md

Files changed (1) hide show

README.md +17 -2

README.md CHANGED Viewed

@@ -1,5 +1,11 @@
 ---
 license: apache-2.0
 ---
 ## Model Description
@@ -31,16 +37,25 @@ dataset for this model. This results in a DPO dataset composed by triplets < ”
 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Meta-Llama-3.1-70B-Instruct  |     0.274      |  0.170   |    0.320     |    0.084    |
-| Meta-Llama-3.1-70B-Egida-DPO |     0.009      |  0.007   |    0.006     |    0.005    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Meta-Llama-3.1-70B-Instruct  |         0.575         |      0.726      |
-| Meta-Llama-3.1-70B-Egida-DPO |         0.577         |      0.038      |
 ## Environmental Impact

 ---
 license: apache-2.0
+datasets:
+- HPAI-BSC/Egida
+language:
+- en
+base_model:
+- meta-llama/Llama-3.1-70B-Instruct
 ---
 ## Model Description
 |                              | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
 |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
 | Meta-Llama-3.1-70B-Instruct  |     0.274      |  0.170   |    0.320     |    0.084    |
+| Meta-Llama-3.1-70B-Instruct-Egida-DPO |     0.009      |  0.007   |    0.006     |    0.005    |
 ### General Purpose Performance
 |                              | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
 |------------------------------|:---------------------:|:---------------:|
 | Meta-Llama-3.1-70B-Instruct  |         0.575         |      0.726      |
+| Meta-Llama-3.1-70B-Instruct-Egida-DPO |         0.577         |      0.038      |
+### Refusal Ratio
+|                              | OR Bench 80K (refusal) ↓ | OR Bench Hard (refusal) ↓ |
+|------------------------------|:---------------------:|:---------------:|
+| Meta-Llama-3.1-70B-Instruct         |          0.008           |           0.022           |
+| Meta-Llama-3.1-70B-Instruct-Egida-DPO        |          0.347           |           0.351           |
+Note that this refusal ratio is computed as keyword matching with a curated list of keywords. For more information, check the paper.
 ## Environmental Impact