gregH commited on
Commit
2faf626
·
verified ·
1 Parent(s): 611f025

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +8 -2
index.html CHANGED
@@ -238,8 +238,14 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
238
 
239
 
240
  <h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
241
- <p> With the exploration of the Refusal Loss landscape, we propose Gradient Cuff,
242
- a two-step jailbreak detection method based on checking the refusal loss and its gradient norm. Our detection procedure is shown below:
 
 
 
 
 
 
243
  </p>
244
 
245
  <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
 
238
 
239
 
240
  <h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
241
+ <p>
242
+ The performance for Jailbreak defending methods is usually measured by how they can reduce the ASR. Major concerns
243
+ when developing such methods is the performance degradation of the LLM on nominal benign prompts and the increased inference time cost
244
+ . We test our method on Vicuna-7B-V1.5 with existing defense methods, jointly considering the ASR, Win Rate, and running time cost. In the
245
+ plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
246
+ PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
247
+ protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
248
+ averaged across the 25 samples. Larger size of a marker means lower running time cost.
249
  </p>
250
 
251
  <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>