Hanna Yukhymenko PRO
hannayukhymenko
AI & ML interests
Multilingual efficiency/safety @ ETHZ
Recent Activity
posted
an
update
about 2 hours ago
Releasing the Jupyter Agent Dataset! 🚀
Built from 7 TB of real Kaggle datasets + 20k notebooks, creating real code exec traces using Qwen3-Coder and E2B.
Training on this data dramatically improves the ability to execute code and analyze data.
We (@baptistecolle @hannayukhymenko @lvwerra) have created a novel synthetic data generation pipeline with efficient scaffolding, which gives a big performance boost after training your coding agent🔥With the help of real Kaggle notebooks and datasets we generate synthetic notebooks which aim to analyze datasets and answer factual questions about them more efficiently. We simulate a real code execution environment by prompting LLMs or with the help of E2B sandboxes. We have built a dataset of 50k+ high-quality LLM-generated notebooks which can help your agent become better at performing data analysis and question answering.
Link: https://huggingface.co/datasets/data-agents/jupyter-agent-dataset
liked
a dataset
about 2 hours ago
data-agents/jupyter-agent-dataset
published
a dataset
about 2 hours ago
data-agents/jupyter-agent-dataset