Spaces:
Running
Running
Update index.html
Browse files- index.html +2 -16
index.html
CHANGED
@@ -245,24 +245,10 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
|
|
245 |
plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
|
246 |
PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
|
247 |
protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
|
248 |
-
averaged across the 25 samples. Larger size of a marker means lower running time cost.
|
249 |
</p>
|
250 |
|
251 |
-
<div class="container"><img id="gradient-cuff-header" src="./
|
252 |
-
|
253 |
-
<p>
|
254 |
-
Gradient Cuff can be summarized into two phases:
|
255 |
-
</p>
|
256 |
-
<p>
|
257 |
-
<strong>(Phase 1) Sampling-based Rejection:</strong> In the first step, we reject the user query by checking whether the Refusal Loss value is below 0.5. If true, then user query is rejected, otherwise, the user query is pushed into phase 2.
|
258 |
-
</p>
|
259 |
-
<p>
|
260 |
-
<strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
|
261 |
-
</p>
|
262 |
-
|
263 |
-
<p>
|
264 |
-
We provide more details about the running flow of Gradient Cuff in the paper.
|
265 |
-
</p>
|
266 |
|
267 |
<h2 id="demonstration">Demonstration</h2>
|
268 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
|
|
|
245 |
plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
|
246 |
PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
|
247 |
protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
|
248 |
+
averaged across the 25 samples selected from the AlpacaEval dataset. Larger size of a marker means lower running time cost.
|
249 |
</p>
|
250 |
|
251 |
+
<div class="container"><img id="gradient-cuff-header" src="./running_time_analysis.png" /></div>
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
252 |
|
253 |
<h2 id="demonstration">Demonstration</h2>
|
254 |
<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
|