lingoly-too

Running

Jude Khouja commited on 7 days ago

Commit

4f2ad60

1 Parent(s): 6226c1b

Fix loading sort and minor description update

Files changed (2) hide show

app.py CHANGED Viewed

@@ -27,7 +27,7 @@ def create_app():
         # Initial load
         app.load(
             fn=lambda: filter_leaderboard(
-                df, "Score after obfuscation"
             ),
             outputs=[lb_output],
         )

         # Initial load
         app.load(
             fn=lambda: filter_leaderboard(
+                df, "Score on obfuscated questions"
             ),
             outputs=[lb_output],
         )

data_loader.py CHANGED Viewed

@@ -269,8 +269,8 @@ HEADER_CONTENT = (
         </div>
         <div class="description">
-            LingOly-TOO (L2) is a challenging linguistics reasoning benchmark designed to counteracts answering without reasoning (e.g. by guessing or memorizing answers).
-            We permute <b>Ling</b>uistics <b>Oly</b>mpiad problems with <b>T</b>emplates and <b>O</b>rthographic <b>O</b>bfuscations. By rewriting (obfuscating) parts of questions and answers, the chance of benchmark leakage in training data is minimized.
             <div class="highlight-question">
                 "How do top LLMs reason on unseen linguistic questions?"
             </div>

         </div>
         <div class="description">
+            LingOly-TOO (L2) is a challenging linguistics reasoning benchmark designed to counteracts answering without reasoning (e.g. by guessing or memorizing answers).
+            We accomplish this by permuting <b>Ling</b>uistics <b>Oly</b>mpiad problems with <b>T</b>emplates and <b>O</b>rthographic <b>O</b>bfuscations. By rewriting (obfuscating) parts of questions and answers, the chance of benchmark leakage in training data is minimized.
             <div class="highlight-question">
                 "How do top LLMs reason on unseen linguistic questions?"
             </div>