xLSTM-7b / README.md
tea-ran's picture
upd README - new tests, append BOS token, native backend fix (#12)
bd3f49f verified
|
raw
history blame
3.78 kB
metadata
license: other

xLSTM-7B

This xLSTM-7B was pre-trained on the DCLM and selected high-quality data for in a total of approx. 2.3 T tokens using the xlstm-jax framework.

How to use it

First, install xlstm, which now uses the mlstm_kernels package for triton kernels (tested on python 3.11):

pip install xlstm
pip install accelerate
pip install 'transformers @ git+https://github.com/huggingface/transformers.git@main'

If you get an error regarding triton library:

pip install 'triton @ git+https://github.com/triton-lang/triton.git@main'

Use this model as:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

xlstm = AutoModelForCausalLM.from_pretrained("NX-AI/xLSTM-7b", device_map="auto")

# this is a fork of EleutherAI/gpt-neox-20b
tokenizer = AutoTokenizer.from_pretrained("NX-AI/xLSTM-7b")

tokens = tokenizer("Explain quantum computing in simple terms.", return_tensors='pt')['input_ids'].to(device="cuda")

# Get the BOS token ID from the tokenizer
bos_id = tokenizer.bos_token_id

# Prepend BOS
bos_tensor = torch.tensor([[bos_id]], device=tokens.device, dtype=tokens.dtype)
tokens_with_bos = torch.cat([bos_tensor, tokens], dim=1)

out = xlstm.generate(tokens_with_bos, max_new_tokens=20)

print(tokenizer.decode(out[0]))

If you cannot or do not want to use the triton kernels, you can change them to native PyTorch implementations:

from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
import torch

xlstm_config = AutoConfig.from_pretrained("NX-AI/xLSTM-7b")
xlstm_config.step_kernel = "native"
xlstm_config.chunkwise_kernel = "chunkwise--native_autograd"
xlstm_config.sequence_kernel = "native_sequence__native"

xlstm = AutoModelForCausalLM.from_pretrained("NX-AI/xLSTM-7b",
                                             config=xlstm_config, device_map="auto")

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("NX-AI/xLSTM-7b")

# Your prompt
prompt = "Explain quantum computing in simple terms."

# Tokenize and send to the same device as the model
inputs = tokenizer(prompt, return_tensors="pt")['input_ids'].to(xlstm.device)

# Get the BOS token ID from the tokenizer
bos_id = tokenizer.bos_token_id

# Prepend BOS
bos_tensor = torch.tensor([[bos_id]], device=xlstm.device, dtype=inputs.dtype)
tokens_with_bos = torch.cat([bos_tensor, inputs], dim=1)

# Generate
outputs = xlstm.generate(
    tokens_with_bos,
    max_new_tokens=200,   # adjust for output length
    temperature=0.7,      # randomness
    top_p=0.9,             # nucleus sampling
    do_sample=True
)

# Decode and print
print(tokenizer.decode(outputs[0]))

# verify selected kernels
from pprint import pprint
pprint(xlstm.backbone.blocks[0].mlstm_layer.config)

Speed results

Generation Speed using torch.cuda.graph and torch.compile optimizations on one NVIDIA H100: generation speed

Performance

mmlu_train_token

Using HuggingFace's lm_eval:

BBH MMLU-Pro Math MUSR GPQA IfEval
0.381 0.242 0.036 0.379 0.280 0.244

Using HuggingFace's lighteval in the Leaderboard-v1 settings:

Arc-Challenge (25-shot) MMLU (5-shot) Hellaswag (10-shot) Winogrande (5-shot) TruthfulQA (0-shot) GSM8k (5-shot) OpenbookQA (5-shot) PiQA (5-shot)
0.584 0.589 0.710 0.742 0.420 0.004 0.443 0.817

License

NXAI Community License (see LICENSE file)