-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 44 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 47 -
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Paper • 2401.14196 • Published • 54 -
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Paper • 2402.03300 • Published • 84
Collections
Discover the best community collections!
Collections including paper arxiv:2406.11931
-
deepseek-ai/DeepSeek-Coder-V2-Instruct
Text Generation • Updated • 81.5k • 557 -
deepseek-ai/DeepSeek-Coder-V2-Base
Text Generation • Updated • 438 • 64 -
deepseek-ai/DeepSeek-Coder-V2-Lite-Base
Text Generation • Updated • 13k • 71 -
deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct
Text Generation • Updated • 168k • 369
-
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
Paper • 2401.02954 • Published • 44 -
DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models
Paper • 2401.06066 • Published • 47 -
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model
Paper • 2405.04434 • Published • 17 -
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Paper • 2403.05525 • Published • 43
-
Phi-4 Technical Report
Paper • 2412.08905 • Published • 106 -
Evaluating and Aligning CodeLLMs on Human Preference
Paper • 2412.05210 • Published • 47 -
Evaluating Language Models as Synthetic Data Generators
Paper • 2412.03679 • Published • 46 -
Yi-Lightning Technical Report
Paper • 2412.01253 • Published • 27
-
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 87 -
Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality
Paper • 2405.21060 • Published • 64 -
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Paper • 2405.20541 • Published • 22 -
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark
Paper • 2406.01574 • Published • 45
-
Bootstrapping Language Models with DPO Implicit Rewards
Paper • 2406.09760 • Published • 39 -
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence
Paper • 2406.11931 • Published • 62 -
Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs
Paper • 2406.14544 • Published • 35 -
Instruction Pre-Training: Language Models are Supervised Multitask Learners
Paper • 2406.14491 • Published • 87