Spaces:
Running
Running
Jude Khouja
commited on
Commit
·
4f2ad60
1
Parent(s):
6226c1b
Fix loading sort and minor description update
Browse files- app.py +1 -1
- data_loader.py +2 -2
app.py
CHANGED
@@ -27,7 +27,7 @@ def create_app():
|
|
27 |
# Initial load
|
28 |
app.load(
|
29 |
fn=lambda: filter_leaderboard(
|
30 |
-
df, "Score
|
31 |
),
|
32 |
outputs=[lb_output],
|
33 |
)
|
|
|
27 |
# Initial load
|
28 |
app.load(
|
29 |
fn=lambda: filter_leaderboard(
|
30 |
+
df, "Score on obfuscated questions"
|
31 |
),
|
32 |
outputs=[lb_output],
|
33 |
)
|
data_loader.py
CHANGED
@@ -269,8 +269,8 @@ HEADER_CONTENT = (
|
|
269 |
</div>
|
270 |
|
271 |
<div class="description">
|
272 |
-
LingOly-TOO (L2) is a challenging linguistics reasoning benchmark designed to counteracts answering without reasoning (e.g. by guessing or memorizing answers).
|
273 |
-
We
|
274 |
<div class="highlight-question">
|
275 |
"How do top LLMs reason on unseen linguistic questions?"
|
276 |
</div>
|
|
|
269 |
</div>
|
270 |
|
271 |
<div class="description">
|
272 |
+
LingOly-TOO (L2) is a challenging linguistics reasoning benchmark designed to counteracts answering without reasoning (e.g. by guessing or memorizing answers).
|
273 |
+
We accomplish this by permuting <b>Ling</b>uistics <b>Oly</b>mpiad problems with <b>T</b>emplates and <b>O</b>rthographic <b>O</b>bfuscations. By rewriting (obfuscating) parts of questions and answers, the chance of benchmark leakage in training data is minimized.
|
274 |
<div class="highlight-question">
|
275 |
"How do top LLMs reason on unseen linguistic questions?"
|
276 |
</div>
|