Token-Highlighter

Running

gregH commited on Feb 29, 2024

Commit

480dabf

verified ·

1 Parent(s): 6e5d176

Update index.html

Files changed (1) hide show

index.html CHANGED Viewed

@@ -171,7 +171,7 @@ We provide more details about the running flow of Gradient Cuff in the paper.
 <h2 id="demonstration">Demonstration</h2>
 <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
-  against 6 different jailbreak attacks (<a href=“#tabs#tabs-1"> GCG</a>, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
   Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
   Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
   shown in the provided bar chart.

 <h2 id="demonstration">Demonstration</h2>
 <p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
+  against 6 different jailbreak attacks (<a href=“#tabs-1"> GCG</a>, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
   Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
   Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
   shown in the provided bar chart.