Tri-7B-Base
Introduction
We present Tri-7B-Base, a foundation language model that serves as the pre-trained base for our Tri-7B model family. This model represents our commitment to efficient training while establishing a strong foundation for downstream fine-tuning and adaptation.
Key Features
- Foundation Architecture: State-of-the-art transformer architecture optimized for efficiency
- Multi-lingual Foundation: Pre-trained on diverse data in Korean, English, and Japanese
- Efficient Training: Optimized training methodology for computational efficiency
Model Specifications
Tri-7B-Base
- Type: Causal Language Model
- Training Stage: Pre-training
- Architecture: Transformer Decoder with RoPE, SwiGLU, RMSNorm
- Number of Parameters: 7.76B
- Number of Layers: 32
- Number of Attention Heads: 32
- Context Length: 4,096
- Vocab Size: 128,128
Use Cases
As a base model, Tri-7B-Base is designed to serve as a foundation for various downstream applications:
- Fine-tuning: Adapt to specific domains or tasks
- Instruction Tuning: Create chat or assistant models
- Domain Specialization: Customize for specific industries or use cases
- Research: Explore model behaviors and capabilities
- Language Generation: General text completion and generation tasks
Limitations
- Base Model Nature: This is a pre-trained base model without instruction tuning or alignment. For chat or assistant capabilities, consider fine-tuned variants.
- Language Support: The model is optimized for English, Korean, and Japanese. Usage with other languages may result in degraded performance.
- Knowledge Cutoff: The model's information is limited to data available up to February, 2025.
- Generation Quality: As a base model, outputs may require post-processing or filtering for production use cases.
License
This model is licensed under the Apache License 2.0.
Contact
For inquiries, please contact: [email protected]