inclusionAI
/

Ring-lite-linear-preview

Text Generation

bailing_moe_linear

Model card Files Files and versions

zhangdonghao commited on Apr 24

Commit

526d552

·

verified ·

1 Parent(s): cf0474d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -21,7 +21,7 @@ pipeline_tag: text-generation
 ## Introduction
-Ring-lite-linear-preview is a hybrid-linear MoE LLM provided and open-sourced by InclusionAI, which has 17.1B parameters with 3.0B activated parameters. It is a long reasoning model based on hybrid-linear attention, achieving near-linear computational complexity and near-constant space complexity during inference. This model was converted from [Ling-lite-0220](https://huggingface.co/inclusionAI/Ling-lite), which adopts the softmax attention-based architecture. It matches the performance of DeepSeek-R1-Distill-Qwen-7B on standardized reasoning benchmarks while substantially reducing computational overhead in both training and inference phases. In certain generation speed tests based on vLLM, we observed that the throughput was more than doubled compared to softmax attention models of the same scale (e.g., Ling-lite). To the best of our knowledge, it is the first open-source hybrid-linear reasoning language model.
 ## Model Downloads
 <div align="center">

 ## Introduction
+Ring-lite-linear-preview is a hybrid-linear MoE LLM provided and open-sourced by InclusionAI, which has 17.1B parameters with 3.0B activated parameters. It is a long reasoning model based on hybrid-linear attention, achieving near-linear computational complexity and near-constant space complexity during inference. This model was converted from [Ling-lite-0220](https://huggingface.co/inclusionAI/Ling-lite/tree/Ling-lite-0220), which adopts the softmax attention-based architecture. It matches the performance of DeepSeek-R1-Distill-Qwen-7B on standardized reasoning benchmarks while substantially reducing computational overhead in both training and inference phases. In certain generation speed tests based on vLLM, we observed that the throughput was more than doubled compared to softmax attention models of the same scale (e.g., Ling-lite). To the best of our knowledge, it is the first open-source hybrid-linear reasoning language model.
 ## Model Downloads
 <div align="center">