3 53

Marco Zocca

ocramz

https://unfoldml.com

AI & ML interests

Program understanding, languages and compilers

Recent Activity

liked a model 4 days ago

Qwen/Qwen2.5-Coder-3B-Instruct

liked a model 4 days ago

Qwen/Qwen2.5-VL-3B-Instruct

liked a model 5 days ago

manycore-research/SpatialLM-Llama-1B

View all activity

Organizations

ocramz's activity

liked 2 models 4 days ago

Qwen/Qwen2.5-Coder-3B-Instruct

Text Generation • Updated Jan 12 • 199k • • 46

Qwen/Qwen2.5-VL-3B-Instruct

Image-Text-to-Text • Updated 8 days ago • 1.23M • 310

liked a model 5 days ago

manycore-research/SpatialLM-Llama-1B

Text Generation • Updated 13 days ago • 13.2k • 883

liked 2 models 2 months ago

hexgrad/Kokoro-82M

Text-to-Speech • Updated 16 days ago • 1.87M • 3.92k

deepseek-ai/DeepSeek-R1

Text Generation • Updated 7 days ago • 1.4M • • 11.8k

reacted to onekq's post with 👍 2 months ago

Post

2306

So 🐋DeepSeek🐋 hits the mainstream media. But it has been a star in our little cult for at least 6 months. Its meteoric success is not overnight, but two years in the making.

To learn their history, just look at their 🤗 repo

deepseek-ai

* End of 2023, they launched the first model (pretrained by themselves) following Llama 2 architecture
* June 2024, v2 (MoE architecture) surpassed Gemini 1.5, but behind Mistral
* September, v2.5 surpassed GPT 4o mini
* December, v3 surpassed GPT 4o
* Now R1 surpassed o1

Most importantly, if you think DeepSeek success is singular and unrivaled, that's WRONG. The following models are also near or equal the o1 bar.

* Minimax-01
* Kimi k1.5
* Doubao 1.5 pro