rootxhacker/arthemis-lm

Building capable language models shouldn't require massive corporate budgets. While the industry pushes toward increasingly large models, this project explores what's possible with neuromorphic architectures and limited resources.

I developed this 155.8M parameter Llama-SNN-LTC model with specific constraints:

  • Budget limit: Under $50 using Google Colab Pro Plus
  • From-scratch pretraining with fully open-source dataset
  • No fine-tuning or synthetic data generation from existing LLMs
  • Focus on architectural innovation over scale

Model Details

This project incorporates Spiking Neural Networks (SNNs) and Liquid Time Constants (LTCs) into the Llama architecture, creating a neuromorphic language model. I spent under $50 on Google Colab Pro Plus and used the first 1M samples from the BabyLM challenge dataset, which contains approximately 100M tokens. This model is working on par with google/bert-large-uncased model

Model Type: Causal Language Model with Neuromorphic Enhancements
Supported Languages: English
Number of Parameters: 155.8M
Context Length: 1024 tokens
Base Architecture: Llama with SNN/LTC modifications
Training Data: BabyLM (vesteinn/babylm) - 1M samples (~100M tokens)

Architecture Features

  • Spiking Neural Networks in attention mechanisms for temporal processing
  • Liquid Time Constants in feed-forward layers for adaptive dynamics
  • 12-layer transformer backbone with neuromorphic enhancements
  • RoPE positional encoding for sequence understanding
  • Custom surrogate gradient training for differentiable spike computation

Here are my major model configurations:

hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2

Usage

Install dependencies

pip install transformers torch numpy

Inference

This gist has full code for inference

https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea

Evaluation

I performed evaluation using https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300

Results Comparison

Model Params Budget HellaSwag OBQA WinoGrande ARC_e ARC_c BoolQ Avg
rootxhacker/arthemis-lm 155.8M <$50 24.65 20.60 48.10 28.20 22.20 39.80 30.59
google/bert-large-uncased 336M N/A 24.53 26.20 49.80 25.08 25.68 40.86 32.03

Observations

  • Budget Efficiency: Our model achieves competitive performance with only ~$50 budget, demonstrating that meaningful language models can be built with limited resources.
  • Neuromorphic Advantages: The SNN-LTC architecture shows particularly strong performance in WinoGrande (48.10%), suggesting enhanced reasoning capabilities from temporal dynamics.
  • Parameter Efficiency: With 155.8M parameters, our model performs comparably to BERT-large-uncased (336M parameters) while being significantly smaller.
  • Room for Improvement: More training data and compute would likely improve performance, but the current results validate the neuromorphic approach.
Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32

Training Details

The model was pretrained from scratch using:

  • Dataset: BabyLM (vesteinn/babylm) - First 1M samples (~100M tokens)
  • Hardware: Google Colab Pro Plus (A100 GPU)
  • Training Steps: 20,000 steps
  • Batch Size: 8 with gradient accumulation
  • Learning Rate: 3e-4 with linear warmup
  • Precision: FP32 for stability with neuromorphic components

Key Innovations

  • Custom SNN Implementation: Leaky Integrate-and-Fire neurons with surrogate gradients
  • Liquid Time Constants: Adaptive time dynamics in feed-forward layers
  • Budget-Conscious Training: Optimized for maximum performance per dollar spent
  • Neuromorphic Language Modeling: First known integration of SNNs and LTCs in causal LM

Future Work

  • Scale to larger datasets with increased compute budget
  • Explore different spiking neuron models (e.g., Adaptive LIF, Izhikevich)
  • Implement more sophisticated LTC architectures
  • Fine-tune for specific downstream tasks
  • Compare energy efficiency with standard transformers

Model Sources

Uses

This model can be used for:

  • Text generation and completion
  • Few-shot learning tasks
  • Research into neuromorphic language models
  • Educational purposes for understanding SNN/LTC architectures
  • Base model for fine-tuning on specific tasks

Limitations

  • Training Data: Limited to 100M tokens (much smaller than typical LLMs)
  • Context Length: Maximum 1024 tokens
  • Domain: Primarily trained on English text
  • Compute: Training limited by budget constraints
  • Performance: Lower than larger, more extensively trained models

Acknowledgments

Special thanks to keeeeenw for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work builds upon those principles while exploring neuromorphic computing approaches to language modeling.

Citation

@misc{arthemis-lm-2024,
  title={Arthemis-LM: A Neuromorphic Language Model with Spiking Neural Networks and Liquid Time Constants},
  author={rootxhacker},
  year={2024},
  howpublished={\url{https://huggingface.co/rootxhacker/arthemis-lm}}
}

License

Apache License 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Dataset used to train rootxhacker/arthemis-lm