-
Will we run out of data? An analysis of the limits of scaling datasets in Machine Learning
Paper • 2211.04325 • Published -
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Paper • 1810.04805 • Published • 17 -
On the Opportunities and Risks of Foundation Models
Paper • 2108.07258 • Published -
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Paper • 2204.07705 • Published • 1
Collections
Discover the best community collections!
Collections including paper arxiv:2402.17764
-
deepseek-ai/DeepSeek-R1
Text Generation • Updated • 2.75M • • 11.3k -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610 -
MiniMax-01: Scaling Foundation Models with Lightning Attention
Paper • 2501.08313 • Published • 276 -
open-r1/OpenR1-Math-220k
Viewer • Updated • 450k • 53k • 492
-
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models
Paper • 2310.04406 • Published • 9 -
Chain-of-Thought Reasoning Without Prompting
Paper • 2402.10200 • Published • 105 -
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization
Paper • 2402.09320 • Published • 6 -
Self-Discover: Large Language Models Self-Compose Reasoning Structures
Paper • 2402.03620 • Published • 115
-
deepseek-ai/DeepSeek-R1
Text Generation • Updated • 2.75M • • 11.3k -
Congliu/Chinese-DeepSeek-R1-Distill-data-110k
Viewer • Updated • 110k • 7.74k • 525 -
2.24k
The Ultra-Scale Playbook
🌌The ultimate guide to training LLM on large GPU Clusters
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610
-
mistralai/Mistral-7B-Instruct-v0.3
Text Generation • Updated • 905k • • 1.48k -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 2.73M • • 9.32k -
PKU-Alignment/align-anything
Viewer • Updated • 69.4k • 6.99k • 29 -
NousResearch/hermes-function-calling-v1
Viewer • Updated • 11.6k • 2.06k • 269
-
1.58-bit FLUX
Paper • 2412.18653 • Published • 80 -
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610 -
BitNet a4.8: 4-bit Activations for 1-bit LLMs
Paper • 2411.04965 • Published • 66 -
BitNet: Scaling 1-bit Transformers for Large Language Models
Paper • 2310.11453 • Published • 97
-
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Paper • 2402.17764 • Published • 610 -
Qwen2.5 Technical Report
Paper • 2412.15115 • Published • 352 -
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
Paper • 2404.14219 • Published • 256 -
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Paper • 2312.11514 • Published • 259
-
SynCamMaster: Synchronizing Multi-Camera Video Generation from Diverse Viewpoints
Paper • 2412.07760 • Published • 50 -
MoViE: Mobile Diffusion for Video Editing
Paper • 2412.06578 • Published • 19 -
Video Motion Transfer with Diffusion Transformers
Paper • 2412.07776 • Published • 17 -
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper • 2412.04814 • Published • 47