A newer version of the Gradio SDK is available:
5.22.0
metadata
title: SmoLLMv2
emoji: 🐢
colorFrom: indigo
colorTo: blue
sdk: gradio
sdk_version: 5.13.1
app_file: app.py
pinned: false
license: mit
short_description: Text generation using smollmv2-135M model
SmoLLMv2: A Small but Efficient Language Model
Training Repo Link Gradio App Link
SmoLLMv2 is a 135M parameter language model designed for efficient text generation. It incorporates several modern architectural improvements while maintaining a small footprint.
Features
Efficient Architecture:
- 30 transformer layers
- 9 attention heads
- 576 embedding dimension
- Memory-efficient attention with reduced KV dimensions
- Rotary Position Embeddings (RoPE)
- SwiGLU activation function
Training Optimizations:
- Mixed precision training (16-bit)
- Gradient accumulation
- OneCycleLR scheduler
- Streaming dataset support
- Automatic model compilation (with PyTorch 2.0+)
Model Architecture
SmoLLMv2 incorporates several efficiency improvements:
- Reduced KV Dimensions: Uses 189-dimensional key/value projections (instead of full 576) to save memory and computation.
- RoPE Attention: Implements Rotary Position Embeddings for better handling of sequential information.
- SwiGLU Activation: Uses the SwiGLU activation function in the MLP layers for better performance.
- Weight Sharing: Shares weights between input embeddings and output projection.
Configuration
The model's behavior can be customized through various configuration classes in config.py
:
SmollmConfig
: Core model architecture and training parametersRoPEConfig
: Rotary Position Embedding settingsOptimizerConfig
: Optimization and learning rate settingsDataConfig
: Dataset and tokenizer configurationTrainerConfig
: Training infrastructure settings
Dataset
The model is trained on the Cosmopedia dataset, which is streamed during training to handle large-scale data efficiently.
Requirements
See requirements.txt
for full dependencies. Key requirements:
- PyTorch ≥ 2.0.0
- Transformers ≥ 4.30.0
- Lightning ≥ 2.0.0
- Gradio ≥ 5.13.1