gregH commited on
Commit
2aae5e7
·
verified ·
1 Parent(s): c605bca

Update index.html

Browse files
Files changed (1) hide show
  1. index.html +2 -2
index.html CHANGED
@@ -81,8 +81,8 @@ Exploring Refusal Loss Landscapes </title>
81
  <p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
82
  autoregressive sampling-based generation. With this randomness, it is an
83
  interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
84
- sometimes be able to bypass the safety guardrail. Based on this observation, for a given LLM $$T_\theta$$ parameterized with $\theta$, we
85
- define the refusal loss function $\phi_\theta(x)$ for a given input user query $x$ as below:
86
  </p>
87
 
88
  <div class="container jailbreak-intro-sec">
 
81
  <p>Current transformer-based LLMs will return different responses to the same query due to the randomness of
82
  autoregressive sampling-based generation. With this randomness, it is an
83
  interesting phenomenon that a malicious user query will sometimes be rejected by the target LLM, but
84
+ sometimes be able to bypass the safety guardrail. Based on this observation, for a given LLM <p>$T_\theta$</p>
85
+ parameterized with $\theta$, we define the refusal loss function $\phi_\theta(x)$ for a given input user query $x$ as below:
86
  </p>
87
 
88
  <div class="container jailbreak-intro-sec">