Spaces:
Running
Running
Update index.html
Browse files- index.html +8 -2
index.html
CHANGED
@@ -238,8 +238,14 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
|
|
238 |
|
239 |
|
240 |
<h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
|
241 |
-
<p>
|
242 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
243 |
</p>
|
244 |
|
245 |
<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
|
|
|
238 |
|
239 |
|
240 |
<h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
|
241 |
+
<p>
|
242 |
+
The performance for Jailbreak defending methods is usually measured by how they can reduce the ASR. Major concerns
|
243 |
+
when developing such methods is the performance degradation of the LLM on nominal benign prompts and the increased inference time cost
|
244 |
+
. We test our method on Vicuna-7B-V1.5 with existing defense methods, jointly considering the ASR, Win Rate, and running time cost. In the
|
245 |
+
plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
|
246 |
+
PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
|
247 |
+
protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
|
248 |
+
averaged across the 25 samples. Larger size of a marker means lower running time cost.
|
249 |
</p>
|
250 |
|
251 |
<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
|