Models
Datasets
Spaces
Posts
Docs
Enterprise
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2402.17764

A collection of arXiv papers from Chip Huyen's AI Engineering organized by chapter and ordered by when each appears in the book.

Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning

Paper • 2211.04325 • Published Oct 26, 2022
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 17
On the Opportunities and Risks of Foundation Models

Paper • 2108.07258 • Published Aug 16, 2021
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Paper • 2204.07705 • Published Apr 16, 2022 • 1

deepseek-ai/DeepSeek-R1

Text Generation • Updated 18 days ago • 2.75M • • 11.3k
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610
MiniMax-01: Scaling Foundation Models with Lightning Attention

Paper • 2501.08313 • Published Jan 14 • 276
open-r1/OpenR1-Math-220k

Viewer • Updated 24 days ago • 450k • 53k • 492

Foundation AI Papers

Curated List of Must-Reads on LLM reasoning at Temus AI team

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 9
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15, 2024 • 105
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14, 2024 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6, 2024 • 115

Medical License Exam

deepseek-ai/DeepSeek-R1

Text Generation • Updated 18 days ago • 2.75M • • 11.3k
Congliu/Chinese-DeepSeek-R1-Distill-data-110k

Viewer • Updated 21 days ago • 110k • 7.74k • 525
Running

2.24k

2.24k

The Ultra-Scale Playbook

🌌

The ultimate guide to training LLM on large GPU Clusters
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610

Running on Zero

252

252

Lumina Brush Uniform Lit

📈

Execute custom code from environment variables
deepseek-ai/DeepSeek-R1

Text Generation • Updated 18 days ago • 2.75M • • 11.3k
deepseek-ai/Janus-Pro-7B

Any-to-Any • Updated Feb 1 • 264k • 3.21k
hexgrad/Kokoro-82M

Text-to-Speech • Updated 10 days ago • 1.58M • 3.66k

Super Intelligence

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610

mistralai/Mistral-7B-Instruct-v0.3

Text Generation • Updated Aug 21, 2024 • 905k • • 1.48k
black-forest-labs/FLUX.1-dev

Text-to-Image • Updated Aug 16, 2024 • 2.73M • • 9.32k
PKU-Alignment/align-anything

Viewer • Updated 6 days ago • 69.4k • 6.99k • 29
NousResearch/hermes-function-calling-v1

Viewer • Updated Aug 30, 2024 • 11.6k • 2.06k • 269

1.58-bit FLUX

Paper • 2412.18653 • Published Dec 24, 2024 • 80
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610
BitNet a4.8: 4-bit Activations for 1-bit LLMs

Paper • 2411.04965 • Published Nov 7, 2024 • 66
BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 97

royalmatrimonial

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27, 2024 • 610
Qwen2.5 Technical Report

Paper • 2412.15115 • Published Dec 19, 2024 • 352
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22, 2024 • 256
LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 259

Computer Vision

SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints

Paper • 2412.07760 • Published Dec 10, 2024 • 50
MoViE: Mobile Diffusion for Video Editing

Paper • 2412.06578 • Published Dec 9, 2024 • 19
Video Motion Transfer with Diffusion Transformers

Paper • 2412.07776 • Published Dec 10, 2024 • 17
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Paper • 2412.04814 • Published Dec 6, 2024 • 47

Previous
1
2
3
...
21
Next

Company

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs