|
TITLE = """# 🏅 DABStep Leaderboard""" |
|
|
|
INTRODUCTION_TEXT = """ |
|
The Data Agent Benchmark for Multi-step Reasoning (DABStep) is looking to measure and push the state-of-the-art in Data Analysis by LLMs. |
|
The benchmark is composed of ~450 data analysis questions ([Dataset Link](https://huggingface.co/datasets/adyen/data-agents-benchmark)) centered around 1 or more documents that agents will have to understand and cross reference in order to answer correctly. |
|
|
|
We have set up a notebook to quickly get an agent baseline using the free Huggingface Inference API: [Colab Notebook](https://colab.research.google.com/drive/1pXi5ffBFNJQ5nn1111SnIfjfKCOlunxu) |
|
""" |
|
|
|
SUBMISSION_TEXT = """ |
|
## Submissions |
|
Scores are expressed as the percentage of correct answers. |
|
|
|
Each question calls for an answer that is either a string (one or a few words), a number, or a comma separated list of strings or floats, unless specified otherwise. There is only one correct answer. |
|
Hence, evaluation is done via quasi exact match between a model’s answer and the ground truth (up to some normalization that is tied to the “type” of the ground truth). |
|
|
|
|
|
We expect submissions to be json-line files with the following format. |
|
Mandatory fields are: `task_id` and `agent_answer`. However, `reasoning_trace` is optional: |
|
``` |
|
{"task_id": "task_id_1", "agent_answer": "Answer 1 from your agent", "reasoning_trace": "The different steps by which your model reached answer 1"} |
|
{"task_id": "task_id_2", "agent_answer": "Answer 2 from your agent", "reasoning_trace": "The different steps by which your model reached answer 2"} |
|
``` |
|
|
|
Our scoring function can be found [here](https://huggingface.co/spaces/adyen/data-agents-benchmark/blob/main/data_agents_benchmark/evaluation/scorer.py). |
|
""" |
|
|
|
CITATION_BUTTON_LABEL = "Copy the following snippet to cite these results" |
|
CITATION_BUTTON_TEXT = r"""@misc{data_agents_benchmark_2025, |
|
title={Data Agents Benchmark}, |
|
author={Martin Iglesias, Alex Egg, Friso Kingma}, |
|
year={2025}, |
|
month={January}, |
|
url={TBD} |
|
}""" |
|
|