MoE Models Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Paper • 2503.05139 • Published Mar 7, 2025 • 6
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Paper • 2503.05139 • Published Mar 7, 2025 • 6
Reasoning Models InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17, 2025 • 9
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17, 2025 • 9
MoE Models Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Paper • 2503.05139 • Published Mar 7, 2025 • 6
Every FLOP Counts: Scaling a 300B Mixture-of-Experts LING LLM without Premium GPUs Paper • 2503.05139 • Published Mar 7, 2025 • 6
Reasoning Models InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17, 2025 • 9
InfiR : Crafting Effective Small Language Models and Multimodal Small Language Models in Reasoning Paper • 2502.11573 • Published Feb 17, 2025 • 9
JohnRoger/Huihui-MoE-12B-A4B-abliterated-Q8_0-GGUF Text Generation • 12B • Updated Jun 16, 2025 • 28 • 1
JohnRoger/DeepSeek-Qwen2.5-14B-DeepThinker-v2-Q8_0-GGUF Text Generation • 68.8M • Updated Apr 23, 2025 • 11