metadata
datasets:
- stanfordnlp/imdb
language:
- en
Model Card for SwarmFormer-Small
SwarmFormer-Small is a lightweight variant of the SwarmFormer architecture, designed for efficient text classification with minimal computational requirements.
Model Details
Model Description
Compact version of SwarmFormer with:
Token embedding layer with dropout (0.3)
Two SwarmFormer layers
Mean pooling and classification
Optimized for shorter sequences
Developed by: Jordan Legg, Mikus Sturmanis, Takara.ai
Funded by: Takara.ai
Shared by: Takara.ai
Model type: Hierarchical transformer
Language(s): English
License: Not specified
Finetuned from model: Trained from scratch
Model Sources
- Repository: https://github.com/takara-ai/SwarmFormer
- Paper: Takara.ai Research
- Demo: Not available
Uses
Direct Use
- Text classification
- Sentiment analysis
- Resource-constrained environments
Out-of-Scope Use
- Text generation
- Machine translation
- Tasks requiring >256 tokens
- Tasks requiring high precision
Training Details
Training Data
- Dataset: IMDB Movie Review
- Size: 50,000 samples
- Augmentation techniques applied
Training Procedure
Model Architecture Details
Token Embedding Layer:
- Embedding layer (vocab_size → 128) - Dropout rate: 0.3
Local Swarm Aggregator:
- Input dropout: 0.3 - Local MLP: - Linear(128 → 128) - GELU - Dropout(0.3) - Linear(128 → 128) - Gate network with GELU
Clustering Mechanism:
- Cluster size: 8 tokens
- Mean pooling per cluster
Global Cluster Attention:
- Q/K/V projections: Linear(128 → 128) - Attention dropout: 0.3
Training Hyperparameters
- Embedding dimension: 128
- Number of layers: 2
- Local update steps: 3
- Cluster size: 8
- Sequence length: 256
- Batch size: 96
- Learning rate: 4.76 × 10⁻⁴
- Weight decay: 0.0541
- Dropout: 0.30
Evaluation
Results
- Accuracy: 86.20%
- Precision: 83.46%
- Recall: 90.31%
- F1: 86.75%
- Inference time: 0.36s (25k samples)
- Mean batch latency: 3.67ms
- Throughput: 45k samples/s
- Peak memory: 8GB
Technical Specifications
Compute Infrastructure
- GPU: NVIDIA RTX 2080 Ti
- VRAM: 8GB minimum
- Training time: 3.6 minutes
How to Get Started
from swarmformer import SwarmFormerModel
model = SwarmFormerModel(
vocab_size=30000,
d_model=128,
seq_len=256,
cluster_size=8,
num_layers=2,
T_local=3
)
Citation
@article{legg2025swarmformer,
title={SwarmFormer: Local-Global Hierarchical Attention via Swarming Token Representations},
author={Legg, Jordan and Sturmanis, Mikus and {Takara.ai}},
journal={Takara.ai Research},
year={2025},
url={https://takara.ai/papers/SwarmFormer-Local-Global-Hierarchical-Attention-via-Swarming-Token-Representations.pdf}
}
Model Card Authors
Jordan Legg, Mikus Sturmanis, Takara.ai Research Team