Rasmus Aagaard's picture
5 17

Rasmus Aagaard

rasgaard

AI & ML interests

Interested in using LLMs in products, evaluation of those products and small models

Recent Activity

Organizations

Hugging Face Discord Community's profile picture

rasgaard's activity

published an article 14 days ago
view article
Article

Scaling Expert judgment with Large Language Models (LLM-as-a-Judge)

By rasgaard ā€¢
upvoted an article about 1 month ago
view article
Article

From Llasa to Llasagna šŸ•: Finetuning LLaSA to generates Italian speech and other languages

By Steveeeeeeen and 1 other ā€¢
ā€¢ 26
reacted to davanstrien's post with šŸ¤— about 2 months ago
view post
Post
3075
Introducing scandi-fine-web-cleaner davanstrien/scandi-fine-web-cleaner, the first model trained on FineWeb-C community annotations!

FineWeb2 is a massive multilingual dataset for pre-training language models. Like any web-scale dataset, it contains low-quality content. How can we improve it?

Over the past months, an amazing community of 400+ annotators has been labelling content quality (using Argilla) across 23 languages through the FineWeb-C initiative.

Today, I'm happy to share the first classifier trained on this data.

šŸ” What we've built:

- A lightweight classifier that efficiently removes low-quality content
- 90%+ precision demonstrated on Danish & Swedish
- Can process the 43M+ documents in Danish FineWeb2 with minimal compute

šŸŒ Why this matters: The approach can be reproduced for any of the 23 languages in FineWeb-C ( data-is-better-together/fineweb-c). We can improve training data quality at scale without massive compute resources by starting with community annotations and training small, efficient classifiers.

Want to build a classifier for your language? Check out the full blog post with code examples and implementation details: https://danielvanstrien.xyz/posts/2025/FineWeb-c/scandinavian-content-filtering-fineweb.html
  • 1 reply
Ā·
upvoted an article 11 months ago
view article
Article

šŸŖ† Introduction to Matryoshka Embedding Models

ā€¢ 86