Safetensors
English
qwen2
safety
danihinjos commited on
Commit
ee28555
Β·
verified Β·
1 Parent(s): c73e0b2

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +21 -2
README.md CHANGED
@@ -1,5 +1,13 @@
1
  ---
2
  license: apache-2.0
 
 
 
 
 
 
 
 
3
  ---
4
 
5
  ## Model Description
@@ -31,14 +39,25 @@ dataset for this model. This results in a DPO dataset composed by triplets < ”
31
  | | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
32
  |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
33
  | Qwen-2.5-72B-Instruct | 0.235 | 0.051 | 0.329 | 0.050 |
34
- | Qwen-2.5-72B-Egida-DPO | 0.125 | 0.042 | 0.210 | 0.019 |
35
 
36
  ### General Purpose Performance
37
 
38
  | | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
39
  |------------------------------|:---------------------:|:---------------:|
40
  | Qwen-2.5-72B-Instruct | 0.618 | 0.771 |
41
- | Qwen-2.5-72B-Egida-DPO | 0.620 | 0.768 |
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ## Environmental Impact
44
 
 
1
  ---
2
  license: apache-2.0
3
+ datasets:
4
+ - HPAI-BSC/Egida
5
+ language:
6
+ - en
7
+ base_model:
8
+ - Qwen/Qwen2.5-72B-Instruct
9
+ tags:
10
+ - safety
11
  ---
12
 
13
  ## Model Description
 
39
  | | Egida (test) ↓ | DELPHI ↓ | Alert-Base ↓ | Alert-Adv ↓ |
40
  |------------------------------|:--------------:|:--------:|:------------:|:-----------:|
41
  | Qwen-2.5-72B-Instruct | 0.235 | 0.051 | 0.329 | 0.050 |
42
+ | Qwen-2.5-72B-Instruct-Egida-DPO | 0.125 | 0.042 | 0.210 | 0.019 |
43
 
44
  ### General Purpose Performance
45
 
46
  | | OpenLLM Leaderboard (Average) ↑ | MMLU Generative (ROUGE1) ↑ |
47
  |------------------------------|:---------------------:|:---------------:|
48
  | Qwen-2.5-72B-Instruct | 0.618 | 0.771 |
49
+ | Qwen-2.5-72B-Instruct-Egida-DPO | 0.620 | 0.768 |
50
+
51
+ ### Refusal Ratio
52
+
53
+ | | OR Bench 80K (refusal) ↓ | OR Bench Hard (refusal) ↓ |
54
+ |------------------------------|:---------------------:|:---------------:|
55
+ | Qwen-2.5-7B-Instruct | 0.015 | 0.102 |
56
+ | Qwen-2.5-7B-Instruct-Egida-DPO | 0.016 | 0.170 |
57
+
58
+ Note that this refusal ratio is computed as keyword matching with a curated list of kewords. For more information, check the paper.
59
+
60
+
61
 
62
  ## Environmental Impact
63