rootxhacker/arthemis-lm
Building capable language models shouldn't require massive corporate budgets. While the industry pushes toward increasingly large models, this project explores what's possible with neuromorphic architectures and limited resources.
I developed this 155.8M parameter Llama-SNN-LTC model with specific constraints:
- Budget limit: Under $50 using Google Colab Pro Plus
- From-scratch pretraining with fully open-source dataset
- No fine-tuning or synthetic data generation from existing LLMs
- Focus on architectural innovation over scale
Model Details
This project incorporates Spiking Neural Networks (SNNs) and Liquid Time Constants (LTCs) into the Llama architecture, creating a neuromorphic language model. I spent under $50 on Google Colab Pro Plus and used the first 1M samples from the BabyLM challenge dataset, which contains approximately 100M tokens. This model is working on par with google/bert-large-uncased model
Model Type: Causal Language Model with Neuromorphic Enhancements
Supported Languages: English
Number of Parameters: 155.8M
Context Length: 1024 tokens
Base Architecture: Llama with SNN/LTC modifications
Training Data: BabyLM (vesteinn/babylm) - 1M samples (~100M tokens)
Architecture Features
- Spiking Neural Networks in attention mechanisms for temporal processing
- Liquid Time Constants in feed-forward layers for adaptive dynamics
- 12-layer transformer backbone with neuromorphic enhancements
- RoPE positional encoding for sequence understanding
- Custom surrogate gradient training for differentiable spike computation
Here are my major model configurations:
hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2
Usage
Install dependencies
pip install transformers torch numpy
Inference
This gist has full code for inference
https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea
Evaluation
I performed evaluation using https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300
Results Comparison
Model | Params | Budget | HellaSwag | OBQA | WinoGrande | ARC_e | ARC_c | BoolQ | Avg |
---|---|---|---|---|---|---|---|---|---|
rootxhacker/arthemis-lm | 155.8M | <$50 | 24.65 | 20.60 | 48.10 | 28.20 | 22.20 | 39.80 | 30.59 |
google/bert-large-uncased | 336M | N/A | 24.53 | 26.20 | 49.80 | 25.08 | 25.68 | 40.86 | 32.03 |
Observations
- Budget Efficiency: Our model achieves competitive performance with only ~$50 budget, demonstrating that meaningful language models can be built with limited resources.
- Neuromorphic Advantages: The SNN-LTC architecture shows particularly strong performance in WinoGrande (48.10%), suggesting enhanced reasoning capabilities from temporal dynamics.
- Parameter Efficiency: With 155.8M parameters, our model performs comparably to BERT-large-uncased (336M parameters) while being significantly smaller.
- Room for Improvement: More training data and compute would likely improve performance, but the current results validate the neuromorphic approach.
Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32
Training Details
The model was pretrained from scratch using:
- Dataset: BabyLM (vesteinn/babylm) - First 1M samples (~100M tokens)
- Hardware: Google Colab Pro Plus (A100 GPU)
- Training Steps: 20,000 steps
- Batch Size: 8 with gradient accumulation
- Learning Rate: 3e-4 with linear warmup
- Precision: FP32 for stability with neuromorphic components
Key Innovations
- Custom SNN Implementation: Leaky Integrate-and-Fire neurons with surrogate gradients
- Liquid Time Constants: Adaptive time dynamics in feed-forward layers
- Budget-Conscious Training: Optimized for maximum performance per dollar spent
- Neuromorphic Language Modeling: First known integration of SNNs and LTCs in causal LM
Future Work
- Scale to larger datasets with increased compute budget
- Explore different spiking neuron models (e.g., Adaptive LIF, Izhikevich)
- Implement more sophisticated LTC architectures
- Fine-tune for specific downstream tasks
- Compare energy efficiency with standard transformers
Model Sources
- Repository: [Coming Soon]
- Paper: [In Progress]
- Hugging Face: rootxhacker/arthemis-lm
Uses
This model can be used for:
- Text generation and completion
- Few-shot learning tasks
- Research into neuromorphic language models
- Educational purposes for understanding SNN/LTC architectures
- Base model for fine-tuning on specific tasks
Limitations
- Training Data: Limited to 100M tokens (much smaller than typical LLMs)
- Context Length: Maximum 1024 tokens
- Domain: Primarily trained on English text
- Compute: Training limited by budget constraints
- Performance: Lower than larger, more extensively trained models
Acknowledgments
Special thanks to keeeeenw for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work builds upon those principles while exploring neuromorphic computing approaches to language modeling.
Citation
@misc{arthemis-lm-2024,
title={Arthemis-LM: A Neuromorphic Language Model with Spiking Neural Networks and Liquid Time Constants},
author={rootxhacker},
year={2024},
howpublished={\url{https://huggingface.co/rootxhacker/arthemis-lm}}
}
License
Apache License 2.0