Token-Highlighter

Running

gregH commited on Feb 14

Commit

2faf626

verified ·

1 Parent(s): 611f025

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -238,8 +238,14 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
 <h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
-<p> With the exploration of the Refusal Loss landscape, we propose Gradient Cuff,
-  a two-step jailbreak detection method based on checking the refusal loss and its gradient norm. Our detection procedure is shown below:
 </p>
 <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>

 <h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
+<p>
+  The performance for Jailbreak defending methods is usually measured by how they can reduce the ASR. Major concerns
+  when developing such methods is the performance degradation of the LLM on nominal benign prompts and the increased inference time cost
+. We test our method on Vicuna-7B-V1.5 with existing defense methods, jointly considering the ASR, Win Rate, and running time cost.  In the
+  plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
+PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
+protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
+averaged across the 25 samples. Larger size of a marker means lower running time cost.
 </p>
 <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>