Token-Highlighter

Running

gregH commited on Feb 14

Commit

1eedde9

verified ·

1 Parent(s): dde0484

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -245,24 +245,10 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
   plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
 PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
 protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
-averaged across the 25 samples. Larger size of a marker means lower running time cost.
 </p>
-<div class="container"><img id="gradient-cuff-header" src="./gradient_cuff.png" /></div>
-<p>
-  Gradient Cuff can be summarized into two phases:
-</p>
-<p>
-    <strong>(Phase 1) Sampling-based Rejection:</strong> In the first step, we reject the user query by checking whether the Refusal Loss value is below 0.5. If true, then user query is rejected, otherwise, the user query is pushed into phase 2.
-</p>
-<p>
-    <strong>(Phase 2) Gradient Norm Rejection:</strong> In the second step, we regard the user query as having jailbreak attempts if the norm of the estimated gradient is larger than a configurable threshold t.
-</p>
-<p>
-We provide more details about the running flow of Gradient Cuff in the paper.
-</p>
 <h2 id="demonstration">Demonstration</h2>
 <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)

   plot shown below, the horizon axis represents the ASR averaged over 6 jailbreak attacks (GCG, AutoDAN,
 PAIR, TAP, Manyshot, and AIM), and the vertica axis shows the Win Rate on Alpaca Eval of the
 protected LLM when the corresponding defense is deployed. The printed value for each marker is the running time
+averaged across the 25 samples selected from the AlpacaEval dataset. Larger size of a marker means lower running time cost.
 </p>
+<div class="container"><img id="gradient-cuff-header" src="./running_time_analysis.png" /></div>
 <h2 id="demonstration">Demonstration</h2>
 <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)