Jude Khouja commited on
Commit
4f2ad60
·
1 Parent(s): 6226c1b

Fix loading sort and minor description update

Browse files
Files changed (2) hide show
  1. app.py +1 -1
  2. data_loader.py +2 -2
app.py CHANGED
@@ -27,7 +27,7 @@ def create_app():
27
  # Initial load
28
  app.load(
29
  fn=lambda: filter_leaderboard(
30
- df, "Score after obfuscation"
31
  ),
32
  outputs=[lb_output],
33
  )
 
27
  # Initial load
28
  app.load(
29
  fn=lambda: filter_leaderboard(
30
+ df, "Score on obfuscated questions"
31
  ),
32
  outputs=[lb_output],
33
  )
data_loader.py CHANGED
@@ -269,8 +269,8 @@ HEADER_CONTENT = (
269
  </div>
270
 
271
  <div class="description">
272
- LingOly-TOO (L2) is a challenging linguistics reasoning benchmark designed to counteracts answering without reasoning (e.g. by guessing or memorizing answers).
273
- We permute <b>Ling</b>uistics <b>Oly</b>mpiad problems with <b>T</b>emplates and <b>O</b>rthographic <b>O</b>bfuscations. By rewriting (obfuscating) parts of questions and answers, the chance of benchmark leakage in training data is minimized.
274
  <div class="highlight-question">
275
  "How do top LLMs reason on unseen linguistic questions?"
276
  </div>
 
269
  </div>
270
 
271
  <div class="description">
272
+ LingOly-TOO (L2) is a challenging linguistics reasoning benchmark designed to counteracts answering without reasoning (e.g. by guessing or memorizing answers).
273
+ We accomplish this by permuting <b>Ling</b>uistics <b>Oly</b>mpiad problems with <b>T</b>emplates and <b>O</b>rthographic <b>O</b>bfuscations. By rewriting (obfuscating) parts of questions and answers, the chance of benchmark leakage in training data is minimized.
274
  <div class="highlight-question">
275
  "How do top LLMs reason on unseen linguistic questions?"
276
  </div>