Model Details

This model builds upon the neuromorphic Llama-SNN-LTC base architecture, incorporating Spiking Neural Networks (SNNs) and Liquid Time Constants (LTCs), and fine-tunes it specifically for instruction following using the Alpaca Cleaned dataset.

Model Type: Instruction-Following Language Model with Neuromorphic Enhancements
Supported Languages: English
Number of Parameters: 155.8M
Context Length: 1024 tokens
Base Architecture: Llama with SNN/LTC modifications
Base Model: rootxhacker/arthemis-lm
Fine-tuning Data: Alpaca Cleaned (~52K instruction-response pairs)

Architecture Features

  • Spiking Neural Networks in attention mechanisms for temporal processing
  • Liquid Time Constants in feed-forward layers for adaptive dynamics
  • 12-layer transformer backbone with neuromorphic enhancements
  • RoPE positional encoding for sequence understanding
  • Custom surrogate gradient training for differentiable spike computation
  • Instruction-following fine-tuning for enhanced conversational abilities

Here are my major model configurations:

hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2

Usage

Install dependencies

pip install transformers torch numpy

Inference

This gist has full code for inference

https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea

Run code!

# Note: This model requires custom implementation due to SNN/LTC architecture
# Standard transformers library cannot load this model directly

# For custom loading, you'll need the specialized architecture:
from custom_model import LlamaSNNLTCModel
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
tokenizer.pad_token = tokenizer.eos_token

# Load the instruction-tuned model
model = LlamaSNNLTCModel.from_pretrained("rootxhacker/arthemis-instruct")

# For instruction-following generation
def generate_instruction_response(instruction, input_text="", model=None, tokenizer=None, max_length=150):
    model.eval()
    device = next(model.parameters()).device
    
    # Reset model states for clean generation
    model.reset_states()
    
    # Format prompt in Alpaca style
    if input_text.strip():
        prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
    
    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    input_ids = inputs['input_ids']
    
    with torch.no_grad():
        for _ in range(max_length - input_ids.shape[1]):
            outputs = model(input_ids)
            logits = outputs['logits'][0, -1, :]
            
            # Sample with temperature for more natural responses
            logits = logits / 0.7
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, 1)
            
            input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=-1)
            
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    generated = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    
    # Extract just the response part
    if "### Response:\n" in generated:
        response = generated.split("### Response:\n")[-1].strip()
        return response
    
    return generated

# Example usage
instruction = "Explain what artificial intelligence is in simple terms."
response = generate_instruction_response(instruction, model=model, tokenizer=tokenizer)
print(f"Instruction: {instruction}")
print(f"Response: {response}")

Evaluation

I performed evaluation using the https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300

Results Comparison

Model Params Budget HellaSwag OBQA WinoGrande ARC_e ARC_c BoolQ Avg
rootxhacker/arthemis-lm 155.8M <$50 24.65 20.60 48.10 28.20 22.20 39.80 30.59
google/bert-large-uncased 336M N/A 24.53 26.20 49.80 25.08 25.68 40.86 32.03

Technical Specifications

Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32
Fine-tuning Dataset: Alpaca Cleaned (52K instructions)

Training Details

The model was fine-tuned from rootxhacker/arthemis-lm using:

  • Base Model: rootxhacker/arthemis-lm (pretrained neuromorphic LLM)
  • Dataset: Alpaca Cleaned (~52K instruction-response pairs)
  • Hardware: Google Colab Pro Plus (A100 GPU)
  • Training Steps: 5,000 steps
  • Batch Size: 4 with gradient accumulation
  • Learning Rate: 5e-5 (lower for fine-tuning)
  • Precision: FP32 for stability with neuromorphic components

Key Features

  • Instruction Format: Uses Alpaca's structured instruction format
  • Response Generation: Optimized for helpful, accurate responses
  • Neuromorphic Preservation: Maintains SNN/LTC benefits during fine-tuning
  • Budget-Conscious: Additional fine-tuning cost under $10

Fine-tuning Process

The fine-tuning process involved:

  1. Base Model Loading: Started from the pretrained arthemis-lm checkpoint
  2. Data Formatting: Converted Alpaca instructions to proper format
  3. Careful Training: Lower learning rate to preserve base model knowledge
  4. State Management: Proper handling of SNN/LTC states during training
  5. Validation: Continuous monitoring of instruction-following quality

Limitations

  • Training Data: Limited to Alpaca Cleaned dataset scope
  • Context Length: Maximum 1024 tokens
  • Domain: Primarily English instructions
  • Custom Architecture: Requires specialized loading code
  • Scale: Smaller than commercial instruction models

Model Sources

Future Work

  • Scale instruction dataset for broader capabilities
  • Add multi-turn conversation support
  • Implement reinforcement learning from human feedback (RLHF)
  • Explore specialized instruction types (coding, math, reasoning)
  • Compare instruction-following efficiency with standard transformers

Acknowledgments

Special thanks to keeeeenw for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work extends those principles to instruction-following capabilities while exploring neuromorphic computing approaches.

Thanks to the Stanford Alpaca team for the high-quality instruction dataset that made this fine-tuning possible.

Citation

@misc{arthemis-instruct-2024,
  title={Arthemis-Instruct: A Neuromorphic Instruction-Following Model with Spiking Neural Networks and Liquid Time Constants},
  author={rootxhacker},
  year={2024},
  howpublished={\url{https://huggingface.co/rootxhacker/arthemis-instruct}}
}

License

Apache License 2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for rootxhacker/arthemis-instruct

Finetunes
1 model

Dataset used to train rootxhacker/arthemis-instruct