--- datasets: - stanfordnlp/imdb language: - en library_name: swarmformer --- # Model Card for SwarmFormer-Small SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements. ## Model Details ### Model Description Compact version of SwarmFormer with: - Token embedding layer with dropout (0.3) - Two SwarmFormer layers - Mean pooling and classification - Optimized for shorter sequences - **Developed by**: Jordan Legg, Mikus Sturmanis, Takara.ai - **Funded by**: Takara.ai - **Shared by**: Takara.ai - **Model type**: Hierarchical transformer - **Language(s)**: English - **License**: Not specified - **Finetuned from model**: Trained from scratch ### Model Sources - **Repository**: https://github.com/takara-ai/SwarmFormer - **Paper**: Takara.ai Research - **Demo**: Not available ## Uses ### Direct Use - Text classification - Sentiment analysis - Resource-constrained environments ### Out-of-Scope Use - Text generation - Machine translation - Tasks requiring >256 tokens - Tasks requiring high precision ## Training Details ### Training Data - Dataset: IMDB Movie Review - Size: 50,000 samples - Augmentation techniques applied ### Training Procedure #### Model Architecture Details 1. **Token Embedding Layer**: ```python - Embedding layer (vocab_size → 128) - Dropout rate: 0.3 ``` 2. **Local Swarm Aggregator**: ```python - Input dropout: 0.3 - Local MLP: - Linear(128 → 128) - GELU - Dropout(0.3) - Linear(128 → 128) - Gate network with GELU ``` 3. **Clustering Mechanism**: - Cluster size: 8 tokens - Mean pooling per cluster 4. **Global Cluster Attention**: ```python - Q/K/V projections: Linear(128 → 128) - Attention dropout: 0.3 ``` #### Training Hyperparameters - Embedding dimension: 128 - Number of layers: 2 - Local update steps: 3 - Cluster size: 8 - Sequence length: 256 - Batch size: 96 - Learning rate: 4.76 × 10⁻⁴ - Weight decay: 0.0541 - Dropout: 0.30 ## Evaluation ### Results - Accuracy: 86.20% - Precision: 83.46% - Recall: 90.31% - F1: 86.75% - Inference time: 0.36s (25k samples) - Mean batch latency: 3.67ms - Throughput: 45k samples/s - Peak memory: 8GB ## Technical Specifications ### Compute Infrastructure - GPU: NVIDIA RTX 2080 Ti - VRAM: 8GB minimum - Training time: 3.6 minutes ### How to Get Started ```python from swarmformer import SwarmFormerModel model = SwarmFormerModel( vocab_size=30000, d_model=128, seq_len=256, cluster_size=8, num_layers=2, T_local=3 ) ``` ## Citation ```bibtex @article{legg2025swarmformer, title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations}, author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}}, journal={Takara.ai Research}, year={2025}, url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf} } ``` ## Model Card Authors Jordan Legg, Mikus Sturmanis, Takara.ai Research Team ## Model Card Contact research@takara.ai