rootxhacker/arthemis-instruct

Model Details

This model builds upon the neuromorphic Llama-SNN-LTC base architecture, incorporating Spiking Neural Networks (SNNs) and Liquid Time Constants (LTCs), and fine-tunes it specifically for instruction following using the Alpaca Cleaned dataset.

Model Type: Instruction-Following Language Model with Neuromorphic Enhancements
Supported Languages: English
Number of Parameters: 155.8M
Context Length: 1024 tokens
Base Architecture: Llama with SNN/LTC modifications
Base Model: rootxhacker/arthemis-lm
Fine-tuning Data: Alpaca Cleaned (~52K instruction-response pairs)

Architecture Features

Spiking Neural Networks in attention mechanisms for temporal processing
Liquid Time Constants in feed-forward layers for adaptive dynamics
12-layer transformer backbone with neuromorphic enhancements
RoPE positional encoding for sequence understanding
Custom surrogate gradient training for differentiable spike computation
Instruction-following fine-tuning for enhanced conversational abilities

Here are my major model configurations:

hidden_size = 768
intermediate_size = 2048
num_hidden_layers = 12
num_attention_heads = 12
num_key_value_heads = 12
max_position_embeddings = 1024
vocab_size = 50257
spiking_threshold = 1.0
ltc_hidden_size = 256
ltc_layers = 2

Usage

Install dependencies

pip install transformers torch numpy

Inference

This gist has full code for inference

https://gist.github.com/harishsg993010/e632de8b15a3ab1ff03e3912f55109ea

Run code!

# Note: This model requires custom implementation due to SNN/LTC architecture
# Standard transformers library cannot load this model directly

# For custom loading, you'll need the specialized architecture:
from custom_model import LlamaSNNLTCModel
from transformers import AutoTokenizer

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/DialoGPT-small")
tokenizer.pad_token = tokenizer.eos_token

# Load the instruction-tuned model
model = LlamaSNNLTCModel.from_pretrained("rootxhacker/arthemis-instruct")

# For instruction-following generation
def generate_instruction_response(instruction, input_text="", model=None, tokenizer=None, max_length=150):
    model.eval()
    device = next(model.parameters()).device
    
    # Reset model states for clean generation
    model.reset_states()
    
    # Format prompt in Alpaca style
    if input_text.strip():
        prompt = f"### Instruction:\n{instruction}\n\n### Input:\n{input_text}\n\n### Response:\n"
    else:
        prompt = f"### Instruction:\n{instruction}\n\n### Response:\n"
    
    inputs = tokenizer(prompt, return_tensors='pt').to(device)
    input_ids = inputs['input_ids']
    
    with torch.no_grad():
        for _ in range(max_length - input_ids.shape[1]):
            outputs = model(input_ids)
            logits = outputs['logits'][0, -1, :]
            
            # Sample with temperature for more natural responses
            logits = logits / 0.7
            probs = torch.softmax(logits, dim=-1)
            next_token = torch.multinomial(probs, 1)
            
            input_ids = torch.cat([input_ids, next_token.unsqueeze(0)], dim=-1)
            
            if next_token.item() == tokenizer.eos_token_id:
                break
    
    generated = tokenizer.decode(input_ids[0], skip_special_tokens=True)
    
    # Extract just the response part
    if "### Response:\n" in generated:
        response = generated.split("### Response:\n")[-1].strip()
        return response
    
    return generated

# Example usage
instruction = "Explain what artificial intelligence is in simple terms."
response = generate_instruction_response(instruction, model=model, tokenizer=tokenizer)
print(f"Instruction: {instruction}")
print(f"Response: {response}")

Evaluation

I performed evaluation using the https://gist.github.com/harishsg993010/e3c31c2d2c8207384ee263627f990300

Results Comparison

Model	Params	Budget	HellaSwag	OBQA	WinoGrande	ARC_e	ARC_c	BoolQ	Avg
rootxhacker/arthemis-lm	155.8M	<$50	24.65	20.60	48.10	28.20	22.20	39.80	30.59
google/bert-large-uncased	336M	N/A	24.53	26.20	49.80	25.08	25.68	40.86	32.03

Technical Specifications

Architecture: Llama + Spiking Neural Networks + Liquid Time Constants
Hidden Size: 768
Intermediate Size: 2048
Attention Heads: 12
Layers: 12
Max Position Embeddings: 1024
Vocabulary Size: 50,257
Spiking Threshold: 1.0
LTC Hidden Size: 256
Training Precision: FP32
Fine-tuning Dataset: Alpaca Cleaned (52K instructions)

Training Details

The model was fine-tuned from rootxhacker/arthemis-lm using:

Base Model: rootxhacker/arthemis-lm (pretrained neuromorphic LLM)
Dataset: Alpaca Cleaned (~52K instruction-response pairs)
Hardware: Google Colab Pro Plus (A100 GPU)
Training Steps: 5,000 steps
Batch Size: 4 with gradient accumulation
Learning Rate: 5e-5 (lower for fine-tuning)
Precision: FP32 for stability with neuromorphic components

Key Features

Instruction Format: Uses Alpaca's structured instruction format
Response Generation: Optimized for helpful, accurate responses
Neuromorphic Preservation: Maintains SNN/LTC benefits during fine-tuning
Budget-Conscious: Additional fine-tuning cost under $10

Fine-tuning Process

The fine-tuning process involved:

Base Model Loading: Started from the pretrained arthemis-lm checkpoint
Data Formatting: Converted Alpaca instructions to proper format
Careful Training: Lower learning rate to preserve base model knowledge
State Management: Proper handling of SNN/LTC states during training
Validation: Continuous monitoring of instruction-following quality

Limitations

Training Data: Limited to Alpaca Cleaned dataset scope
Context Length: Maximum 1024 tokens
Domain: Primarily English instructions
Custom Architecture: Requires specialized loading code
Scale: Smaller than commercial instruction models

Model Sources

Repository: [Coming Soon]
Base Model: rootxhacker/arthemis-lm
Hugging Face: rootxhacker/arthemis-instruct

Future Work

Scale instruction dataset for broader capabilities
Add multi-turn conversation support
Implement reinforcement learning from human feedback (RLHF)
Explore specialized instruction types (coding, math, reasoning)
Compare instruction-following efficiency with standard transformers

Acknowledgments

Special thanks to keeeeenw for the inspiration and open-source MicroLlama project, which demonstrated that impressive language models can be built on a budget. This work extends those principles to instruction-following capabilities while exploring neuromorphic computing approaches.

Thanks to the Stanford Alpaca team for the high-quality instruction dataset that made this fine-tuning possible.

Citation

@misc{arthemis-instruct-2024,
  title={Arthemis-Instruct: A Neuromorphic Instruction-Following Model with Spiking Neural Networks and Liquid Time Constants},
  author={rootxhacker},
  year={2024},
  howpublished={\url{https://huggingface.co/rootxhacker/arthemis-instruct}}
}

License

Apache License 2.0

rootxhacker
/

arthemis-instruct