gregH commited on
Commit
1eedde9
·
verified ·
1 Parent(s): dde0484

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +2 -16
index.html CHANGED
@@ -245,24 +245,10 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
245
  plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
246
  PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
247
  protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
248
- averaged across the 25 samples. Larger size of a marker means lower running time cost.
249
  </p>
250
 
251
- <div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
252
-
253
- <p>
254
- Gradient Cuff can be summarized into two phases:
255
- </p>
256
- <p>
257
- <strong>(Phase 1) Sampling-based Rejection:</strong> In the first step, we reject the user query by checking whether the Refusal Loss value is below 0.5. If true, then user query is rejected, otherwise, the user query is pushed into phase 2.
258
- </p>
259
- <p>
260
- <strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
261
- </p>
262
-
263
- <p>
264
- We provide more details about the running flow of Gradient Cuff in the paper.
265
- </p>
266
 
267
  <h2 id="demonstration">Demonstration</h2>
268
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
 
245
  plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
246
  PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
247
  protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
248
+ averaged across the 25 samples selected from the AlpacaEval dataset. Larger size of a marker means lower running time cost.
249
  </p>
250
 
251
+ <div class="container"><img id="gradient-cuff-header" src="./running_time_analysis.png" /></div>
 
 
 
 
 
 
 
 
 
 
 
 
 
 
252
 
253
  <h2 id="demonstration">Demonstration</h2>
254
  <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)