xLSTM-7B

This xLSTM-7B was pre-trained on the DCLM and selected high-quality data for in a total of approx. 2.3 T tokens using the xlstm-jax framework.

How to use it

First, install xlstm, which now uses the mlstm_kernels package for triton kernels (tested on python 3.11):

pip install xlstm
pip install accelerate
pip install 'transformers @ git+https://github.com/huggingface/transformers.git@main'

If you get an error regarding triton library:

pip install 'triton @ git+https://github.com/triton-lang/triton.git@main'

Use this model as:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

xlstm = AutoModelForCausalLM.from_pretrained("NX-AI/xLSTM-7b", device_map="auto")

# this is a fork of EleutherAI/gpt-neox-20b
tokenizer = AutoTokenizer.from_pretrained("NX-AI/xLSTM-7b")

tokens = tokenizer("Explain quantum computing in simple terms.", return_tensors='pt')['input_ids'].to(device="cuda")

# Get the BOS token ID from the tokenizer
bos_id = tokenizer.bos_token_id

# Prepend BOS
bos_tensor = torch.tensor([[bos_id]], device=tokens.device, dtype=tokens.dtype)
tokens_with_bos = torch.cat([bos_tensor, tokens], dim=1)

out = xlstm.generate(tokens_with_bos, max_new_tokens=20)

print(tokenizer.decode(out[0]))

If you cannot or do not want to use the triton kernels, you can change them to native PyTorch implementations:

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import torch

xlstm_config = AutoConfig.from_pretrained("NX-AI/xLSTM-7b")
xlstm_config.step_kernel = "native"
xlstm_config.chunkwise_kernel = "chunkwise--native_autograd"
xlstm_config.sequence_kernel = "native_sequence__native"

xlstm = AutoModelForCausalLM.from_pretrained("NX-AI/xLSTM-7b",
                                             config=xlstm_config, device_map="auto")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("NX-AI/xLSTM-7b")

# Your prompt
prompt = "Explain quantum computing in simple terms."

# Tokenize and send to the same device as the model
inputs = tokenizer(prompt, return_tensors="pt")['input_ids'].to(xlstm.device)

# Get the BOS token ID from the tokenizer
bos_id = tokenizer.bos_token_id

# Prepend BOS
bos_tensor = torch.tensor([[bos_id]], device=xlstm.device, dtype=inputs.dtype)
tokens_with_bos = torch.cat([bos_tensor, inputs], dim=1)

# Generate
outputs = xlstm.generate(
    tokens_with_bos,
    max_new_tokens=200,   # adjust for output length
    temperature=0.7,      # randomness
    top_p=0.9,             # nucleus sampling
    do_sample=True
)

# Decode and print
print(tokenizer.decode(outputs[0]))

# verify selected kernels
from pprint import pprint
pprint(xlstm.backbone.blocks[0].mlstm_layer.config)

Speed results

Generation Speed using torch.cuda.graph and torch.compile optimizations on one NVIDIA H100: generation speed

Performance

mmlu_train_token

Using HuggingFace's lm_eval:

BBH MMLU-Pro Math MUSR GPQA IfEval
0.381 0.242 0.036 0.379 0.280 0.244

Using HuggingFace's lighteval in the Leaderboard-v1 settings:

Arc-Challenge (25-shot) MMLU (5-shot) Hellaswag (10-shot) Winogrande (5-shot) TruthfulQA (0-shot) GSM8k (5-shot) OpenbookQA (5-shot) PiQA (5-shot)
0.584 0.589 0.710 0.742 0.420 0.004 0.443 0.817

License

NXAI Community License (see LICENSE file)

Downloads last month
1,697
Safetensors
Model size
6.87B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NX-AI/xLSTM-7b

Adapters
1 model