Quasar-40B
Quasar-40B is a high-performance language model based on a renamed and enhanced version of the Klear-46B-A2.5B-Base model by Kwai-Klear. It leverages a Mixture of Experts (MoE) Transformer architecture, pretrained on a massive dataset of 22 trillion tokens. Quasar-40B introduces key improvements, including an increased number of experts per token, a Positional Memory Bank, and the removal of caching mechanisms for optimized performance.
About Quasar-40B
Quasar-40B is designed for state-of-the-art natural language processing with enhanced efficiency and scalability. Building on the Klear-46B-A2.5B-Base model, it addresses limitations and introduces advanced features:
- Increased Experts per Token: More experts per token in the MoE architecture for better specialization and performance.
- Positional Memory Bank: Enhances context retention for long-sequence tasks, improving coherence and accuracy.
- Removed Caching: Eliminates caching to reduce memory overhead and improve inference speed.
These enhancements make Quasar-40B ideal for research, production deployment, and fine-tuning for specialized tasks.
Model Details
- Base Model: Kwai-Klear/Klear-46B-A2.5B-Base
- Architecture: Mixture of Experts (MoE) Transformer
- Model Type: Quasar
- Pretraining Data: 22 trillion tokens from diverse text sources
- Key Enhancements:
- Increased number of experts per token
- Positional Memory Bank for improved context retention
- Removed caching for faster inference and lower memory usage
VL Support Soon
- Downloads last month
- 22