47 120 406

Thomas Wolf PRO

thomwolf

https://thomwolf.io

AI & ML interests

NLP and open-source :-)

Recent Activity

liked a model about 2 hours ago

mistralai/Mistral-Small-24B-Base-2501

liked a Space about 2 hours ago

science/README

liked a model about 11 hours ago

allenai/Llama-3.1-Tulu-3-405B

View all activity

Articles

Introducing smolagents: simple agents that write actions in code.

Dec 31, 2024

• 536

FineWeb2-C: Help Build Better Language Models in Your Language

Dec 23, 2024

• 18

LeMaterial: an open source initiative to accelerate materials discovery and research

Dec 10, 2024

• 36

Organizations

thomwolf's activity

liked a model about 2 hours ago

mistralai/Mistral-Small-24B-Base-2501

Text Generation • Updated about 8 hours ago • 120

liked a Space about 2 hours ago

Running

🐠

allenai/Llama-3.1-Tulu-3-405B

Text Generation • Updated 1 day ago • 14 • 20

upvoted an article 2 days ago

Article

Welcome to Inference Providers on the Hub 🔥

3 days ago

• 171

upvoted an article 3 days ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

3 days ago

• 460

reacted to mitkox's post with 🚀👍 3 days ago

Post

2051

llama.cpp is 26.8% faster than ollama.
I have upgraded both, and using the same settings, I am running the same DeepSeek R1 Distill 1.5B on the same hardware. It's an Apples to Apples comparison.

Total duration:
llama.cpp 6.85 sec <- 26.8% faster
ollama 8.69 sec

Breakdown by phase:
Model loading
llama.cpp 241 ms <- 2x faster
ollama 553 ms

Prompt processing
llama.cpp 416.04 tokens/s with an eval time 45.67 ms <- 10x faster
ollama 42.17 tokens/s with an eval time of 498 ms

Token generation
llama.cpp 137.79 tokens/s with an eval time 6.62 sec <- 13% faster
ollama 122.07 tokens/s with an eval time 7.64 sec

llama.cpp is LLM inference in C/C++; ollama adds abstraction layers and marketing.

Make sure you own your AI. AI in the cloud is not aligned with you; it's aligned with the company that owns it.

7 replies

authored a paper 15 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 16 days ago • 51

upvoted 2 articles 16 days ago

Article

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

•

16 days ago

• 40

Article

Diving into MiniMax01 405B MoE

•

16 days ago

• 17

liked a model 17 days ago

NovaSky-AI/Sky-T1-32B-Preview

Text Generation • Updated 18 days ago • 15.3k • 520

liked a Space 17 days ago

Running

429

📈

2024 AI Timeline

liked 2 models 17 days ago

microsoft/phi-4

Text Generation • Updated 22 days ago • 284k • 1.62k

hexgrad/Kokoro-82M

Text-to-Speech • Updated 1 day ago • 55.4k • 2.59k

liked a model 22 days ago

refuelai/Qwen-2-Refueled

Updated 22 days ago • 22 • 3

upvoted a paper 24 days ago

DigiRL: Training In-The-Wild Device-Control Agents with Autonomous Reinforcement Learning

Paper • 2406.11896 • Published Jun 14, 2024 • 20

updated a Space 24 days ago

Running

🐠

README

reacted to lewtun's post with 🔥 25 days ago

Post

3682

I was initially pretty sceptical about Meta's Coconut paper [1] because the largest perf gains were reported on toy linguistic problems. However, these results on machine translation are pretty impressive!

https://x.com/casper_hansen_/status/1875872309996855343

Together with the recent PRIME method [2] for scaling RL, reasoning for open models is looking pretty exciting for 2025!

[1] Training Large Language Models to Reason in a Continuous Latent Space (2412.06769)
[2] https://huggingface.co/blog/ganqu/prime

liked a model 25 days ago

deepseek-ai/DeepSeek-V3

Text Generation • Updated 7 days ago • 409k • 2.87k

upvoted an article 26 days ago

Article

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark

•

28 days ago

• 39

Thomas Wolf PRO

AI & ML interests

Recent Activity

Articles

Introducing smolagents: simple agents that write actions in code.

FineWeb2-C: Help Build Better Language Models in Your Language

LeMaterial: an open source initiative to accelerate materials discovery and research

FineVideo: behind the scenes

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

A failed experiment: Infini-Attention, and why we should keep trying?

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Constitutional AI with Open LLMs

Open LLM Leaderboard: DROP deep dive

What's going on with the Open LLM Leaderboard?

Can foundation models label data like humans?

Organizations

thomwolf's activity

README

Welcome to Inference Providers on the Hub 🔥

Open-R1: a fully open reproduction of DeepSeek-R1

MiniMax-01 is Now Open-Source: Scaling Lightning Attention for the AI Agent Era

Diving into MiniMax01 405B MoE

2024 AI Timeline

README

🐺🐦‍⬛ LLM Comparison/Test: DeepSeek-V3, QVQ-72B-Preview, Falcon3 10B, Llama 3.3 70B, Nemotron 70B in my updated MMLU-Pro CS benchmark