Token-Highlighter

Running

App Files Files Community

gregH commited on Feb 14

Commit

9091f63

verified ·

1 Parent(s): adf2c0f

Update index.html

Browse files

Files changed (1) hide show

index.html +8 -59

index.html CHANGED Viewed

@@ -237,7 +237,7 @@ should begin with ’Sure, here is guidance on how to cause physical harm to oth
 </script>
-<h2 id="proposed-approach-gradient-cuff">Performance evaluation against practical Jailbreaks</h2>
 <p>
   The performance for Jailbreak defending methods is usually measured by how they can reduce the ASR. Major concerns
   when developing such methods is the performance degradation of the LLM on nominal benign prompts and the increased inference time cost
@@ -269,70 +269,19 @@ ASR increase. When &beta is fixed, larger &alpha would both reduce the ASR and t
 </div>
 <h2 id="demonstration">Demonstration</h2>
-<p>We evaluated Gradient Cuff as well as 4 baselines (Perplexity Filter, SmoothLLM, Erase-and-Check, and Self-Reminder)
-  against 6 different jailbreak attacks (GCG, AutoDAN, PAIR, TAP, Base64, and LRL) and benign user queries on 2 LLMs (LLaMA-2-7B-Chat and
-  Vicuna-7B-V1.5). We below demonstrate the average refusal rate across these 6 malicious user query datasets as the Average Malicious Refusal
-  Rate and the refusal rate on benign user queries as the Benign Refusal Rate. The defending performance against different jailbreak types is
-  shown in the provided bar chart.
-</p>
-<div id="jailbreak-demo" class="container">
-<div class="row align-items-center">
-  <div class="row" style="margin: 10px 0 0">
-      <div class="models-list">
-        <span style="margin-right: 1em;">Models</span>
-        <span class="radio-group"><input type="radio" id="LLaMA2" class="options" name="models" value="llama2_7b_chat" checked="" /><label for="LLaMA2" class="option-label">LLaMA-2-7B-Chat</label></span>
-        <span class="radio-group"><input type="radio" id="Vicuna" class="options" name="models" value="vicuna_7b_v1.5" /><label for="Vicuna" class="option-label">Vicuna-7B-V1.5</label></span>
-      </div>
-  </div>
-</div>
-<div class="row align-items-center">
-  <div class="col-4">
-    <div id="defense-methods">
-      <div class="row align-items-center"><input type="radio" id="defense_ppl" class="options" name="defense" value="ppl" /><label for="defense_ppl" class="defense">Perplexity Filter</label></div>
-      <div class="row align-items-center"><input type="radio" id="defense_smoothllm" class="options" name="defense" value="smoothllm" /><label for="defense_smoothllm" class="defense">SmoothLLM</label></div>
-      <div class="row align-items-center"><input type="radio" id="defense_erase_check" class="options" name="defense" value="erase_check" /><label for="defense_erase_check" class="defense">Erase-Check</label></div>
-      <div class="row align-items-center"><input type="radio" id="defense_self_reminder" class="options" name="defense" value="self_reminder" /><label for="defense_self_reminder" class="defense">Self-Reminder</label></div>
-      <div class="row align-items-center"><input type="radio" id="defense_gradient_cuff" class="options" name="defense" value="gradient_cuff" checked=""  /><label for="defense_gradient_cuff" class="defense"><span style="font-weight: bold;">Gradient Cuff</span></label></div>
-    </div>
-    <div class="row align-items-center">
-      <div class="attack-success-rate"><span class="jailbreak-metric">Average Malicious Refusal Rate</span><span class="attack-success-rate-value" id="asr-value">0.959</span></div>
-    </div>
-    <div class="row align-items-center">
-      <div class="benign-refusal-rate"><span class="jailbreak-metric">Benign Refusal Rate</span><span class="benign-refusal-rate-value" id="brr-value">0.050</span></div>
-    </div>
-  </div>
-  <div class="col-8">
-  <figure class="figure">
-    <img id="reliability-diagram" src="demo_results/gradient_cuff_llama2_7b_chat_threshold_100.png" alt="CIFAR-100 Calibrated Reliability Diagram (Full)" />
-    <div class="slider-container">
-      <div class="slider-label"><span>Perplexity Threshold</span></div>
-      <div class="slider-content" id="ppl-slider"><div id="ppl-threshold" class="ui-slider-handle"></div></div>
-    </div>
-    <div class="slider-container">
-      <div class="slider-label"><span>Gradient Threshold</span></div>
-      <div class="slider-content" id="gradient-norm-slider"><div id="gradient-norm-threshold" class="slider-value ui-slider-handle"></div></div>
-    </div>
-    <figcaption class="figure-caption">
-    </figcaption>
-  </figure>
-  </div>
-</div>
-</div>
 <p>
-Higher malicious refusal rate and lower benign refusal rate mean a better defense.
-Overall, Gradient Cuff is the most performant compared with those baselines. We also evaluated Gradient Cuff against adaptive attacks
-in the paper.
 </p>
-<h2 id="inquiries"> Inquiries on LLM with Token Highlighter defense</h2>
-<p> Please contact <a href="Mailto:[email protected]">Xiaomeng Hu</a>
 and <a href="Mailto:[email protected]">Pin-Yu Chen</a>
 </p>
 <h2 id="citations">Citations</h2>
-<p>If you find Gradient Cuff helpful and useful for your research, please cite our main paper as follows:</p>
 <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{DBLP:journals/corr/abs-2412-18171,
   author       = {Xiaomeng Hu and

 </script>
+<h2 id="proposed-approach-gradient-cuff">Performance Evaluatio</h2>
 <p>
   The performance for Jailbreak defending methods is usually measured by how they can reduce the ASR. Major concerns
   when developing such methods is the performance degradation of the LLM on nominal benign prompts and the increased inference time cost
 </div>
 <h2 id="demonstration">Demonstration</h2>
 <p>
+  Below, we demonstrate how Token Highlighter influences the output generation of a large language model (LLM) in response to user prompts.
+  We showcase four illustrative examples, including GCG Jailbreak, TAP Jailbreak, vanilla harmful behavior, and benign user requests.
+  Additionally, we have developed a private live demo that allows users to interact with LLMs enhanced by Token Highlighter. Stay tuned for its public release!
 </p>
+<h2 id="inquiries"> Inquiries</h2>
+<p> If you have any questions regarding the Token Highlighter. Please contact <a href="Mailto:[email protected]">Xiaomeng Hu</a>
 and <a href="Mailto:[email protected]">Pin-Yu Chen</a>
 </p>
 <h2 id="citations">Citations</h2>
+<p>If you find Token Highlighter helpful and useful for your research, please cite our main paper as follows:</p>
 <div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>@article{DBLP:journals/corr/abs-2412-18171,
   author       = {Xiaomeng Hu and