Lewis Tunstall's picture

Lewis Tunstall PRO

lewtun

·

https://lewtun.github.io/blog/

AI & ML interests

LLMs, LLMs, LLMs

Articles

Open-R1: a fully open reproduction of DeepSeek-R1

Universal Assisted Generation: Faster Decoding with Any Assistant Model

Faster Assisted Generation with Dynamic Speculation

Llama can now see and run on your device - welcome Llama 3.2

FineVideo: behind the scenes

How NuminaMath Won the 1st AIMO Progress Prize

Welcome Gemma 2 - Google's new open LLM

Constitutional AI with Open LLMs

Preference Tuning LLMs with Direct Preference Optimization Methods

Mixture of Experts Explained

Welcome Mixtral - a SOTA Mixture of Experts on Hugging Face

SetFitABSA: Few-Shot Aspect Based Sentiment Analysis using SetFit

Fine-tuning Llama 2 70B using PyTorch FSDP

Code Llama: Llama 2 learns to code

Llama 2 is here - get it on Hugging Face

Can foundation models label data like humans?

The Falcon has landed in the Hugging Face ecosystem

Creating a Coding Assistant with StarCoder

StackLLaMA: A hands-on guide to train LLaMA with RLHF

Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU

Red-Teaming Large Language Models

Diffusion Models Live Event

Very Large Language Models and How to Evaluate Them

SetFit: Efficient Few-Shot Learning Without Prompts

Announcing Evaluation on the Hub

Organizations

Posts 6

Post

9443

We are reproducing the full DeepSeek R1 data and training pipeline so everybody can use their recipe. Instead of doing it in secret we can do it together in the open!

🧪 Step 1: replicate the R1-Distill models by distilling a high-quality reasoning corpus from DeepSeek-R1.

🧠 Step 2: replicate the pure RL pipeline that DeepSeek used to create R1-Zero. This will involve curating new, large-scale datasets for math, reasoning, and code.

🔥 Step 3: show we can go from base model -> SFT -> RL via multi-stage training.

Follow along: https://github.com/huggingface/open-r1

Post

3678

I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime

Collections 4

Papers 7

arxiv:2310.16944

arxiv:2303.12582

arxiv:2210.01970

arxiv:2209.11055

spaces 20

Chuck Norris Jokes

OpenGPT

Donut Docvqa

Argilla Space Template

No application file

Chip

Dreambooth Training

models 260

lewtun/Qwen2-0.5B-SFT

Updated Oct 17, 2024

lewtun/Qwen2.5-0.5B-SFT-LoRA

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-LoRA-packing-no-lm-head

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-LoRA-no-packing

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-QLoRA-packing

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-LoRA-packing-no-saved-modules

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-LoRA-packing

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-LoRA-packing-pad-token-eos

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-QLoRA-packing-pad-token-eos

Updated Sep 30, 2024

lewtun/Llama-3.1-8B-SFT-full-packing

Text Generation • Updated Sep 30, 2024 • 4

datasets 61

lewtun/details_Qwen__Qwen2.5-Math-1.5B-Instruct

Viewer • Updated 8 days ago • 11k • 41

lewtun/test-eval

Viewer • Updated 10 days ago • 1 • 19

lewtun/Llama-3.2-1B-Instruct-best_of_n-prm-completions

Viewer • Updated 25 days ago • 10 • 51

lewtun/math-bon-debug

Preview • Updated Dec 9, 2024 • 638

lewtun/test-fast-parser-l1b-v3

Viewer • Updated Nov 26, 2024 • 509 • 48

lewtun/test-fast-parser-l1b-v2

Viewer • Updated Nov 26, 2024 • 509 • 67

lewtun/test-fast-parser-l1b

Viewer • Updated Nov 26, 2024 • 509 • 58

lewtun/test-fast-parser

Viewer • Updated Nov 26, 2024 • 25 • 54

lewtun/bon-prm-serverless-batched

Viewer • Updated Nov 20, 2024 • 240 • 67

lewtun/ultrafeedback_gkd

Viewer • Updated Sep 14, 2024 • 62.1k • 47