rootxhacker/arthemis-lm

Building capable language models shouldn't require massive corporate budgets. While the industry pushes toward increasingly large models, this project explores what's possible with neuromorphic architectures and limited resources.

I developed this 155.8M parameter Llama-SNN-LTC model with specific constraints:

Budget limit: Under $50 using Google Colab Pro Plus
From-scratch pretraining with fully open-source dataset
No fine-tuning or synthetic data generation from existing LLMs
Focus on architectural innovation over scale

Model Details

This project incorporates Spiking Neural Networks (SNNs) and Liquid Time Constants (LTCs) into the Llama architecture, creating a neuromorphic language model. I spent under $50 on Google Colab Pro Plus and used the first 1M samples from the BabyLM challenge dataset, which contains approximately 100M tokens. This model is working on par with google/bert-large-uncased model

Model Type: Causal Language Model with Neuromorphic Enhancements
Supported Languages: English
Number of Parameters: 155.8M
Context Length: 1024 tokens
Base Architecture: Llama with SNN/LTC modifications
Training Data: BabyLM (vesteinn/babylm) - 1M samples (~100M tokens)

Architecture Features

Spiking Neural Networks in attention mechanisms for temporal processing
Liquid Time Constants in feed-forward layers for adaptive dynamics
12-layer transformer backbone with neuromorphic enhancements
RoPE positional encoding for sequence understanding
Custom surrogate gradient training for differentiable spike computation

Here are my major model configurations:

hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2

Usage

Install dependencies

pip install transformers torch numpy

Inference

This gist has full code for inference

https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea

Evaluation

I performed evaluation using https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300

Results Comparison

Model	Params	Budget	HellaSwag	OBQA	WinoGrande	ARC_e	ARC_c	BoolQ	Avg
rootxhacker/arthemis-lm	155.8M	<$50	24.65	20.60	48.10	28.20	22.20	39.80	30.59
google/bert-large-uncased	336M	N/A	24.53	26.20	49.80	25.08	25.68	40.86	32.03

Observations

Budget Efficiency: Our model achieves competitive performance with only ~$50 budget, demonstrating that meaningful language models can be built with limited resources.
Neuromorphic Advantages: The SNN-LTC architecture shows particularly strong performance in WinoGrande (48.10%), suggesting enhanced reasoning capabilities from temporal dynamics.
Parameter Efficiency: With 155.8M parameters, our model performs comparably to BERT-large-uncased (336M parameters) while being significantly smaller.
Room for Improvement: More training data and compute would likely improve performance, but the current results validate the neuromorphic approach.

Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32

Training Details

The model was pretrained from scratch using:

Dataset: BabyLM (vesteinn/babylm) - First 1M samples (~100M tokens)
Hardware: Google Colab Pro Plus (A100 GPU)
Training Steps: 20,000 steps
Batch Size: 8 with gradient accumulation
Learning Rate: 3e-4 with linear warmup
Precision: FP32 for stability with neuromorphic components

Key Innovations

Custom SNN Implementation: Leaky Integrate-and-Fire neurons with surrogate gradients
Liquid Time Constants: Adaptive time dynamics in feed-forward layers
Budget-Conscious Training: Optimized for maximum performance per dollar spent
Neuromorphic Language Modeling: First known integration of SNNs and LTCs in causal LM

Future Work

Scale to larger datasets with increased compute budget
Explore different spiking neuron models (e.g., Adaptive LIF, Izhikevich)
Implement more sophisticated LTC architectures
Fine-tune for specific downstream tasks
Compare energy efficiency with standard transformers

Model Sources

Repository: [Coming Soon]
Paper: [In Progress]
Hugging Face: rootxhacker/arthemis-lm

Uses

This model can be used for:

Text generation and completion
Few-shot learning tasks
Research into neuromorphic language models
Educational purposes for understanding SNN/LTC architectures
Base model for fine-tuning on specific tasks

Limitations

Training Data: Limited to 100M tokens (much smaller than typical LLMs)
Context Length: Maximum 1024 tokens
Domain: Primarily trained on English text
Compute: Training limited by budget constraints
Performance: Lower than larger, more extensively trained models

Acknowledgments

Special thanks to keeeeenw for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work builds upon those principles while exploring neuromorphic computing approaches to language modeling.

Citation

@misc{arthemis-lm-2024,
  title={Arthemis-LM: A Neuromorphic Language Model with Spiking Neural Networks and Liquid Time Constants},
  author={rootxhacker},
  year={2024},
  howpublished={\url{https://huggingface.co/rootxhacker/arthemis-lm}}
}

License

Apache License 2.0

rootxhacker
/

arthemis-lm