budecosystem/genz-13b-infinite

Introducing GenZ Infinite

The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity

Generate responses

Use the generate.py file from the github repo

python generate.py --base_model budecosystem/genz-13b-infinite

You can integrate the model in your code my loading convert_llama_model function.

import torch
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
from model.llama import convert_llama_model

local_branch = 2048
global_branch = 10
limit_distance = 2048

model = AutoModelForCausalLM.from_pretrained(
    "budecosystem/genz-13b-infinite",
    torch_dtype=torch.float16,
    device_map="auto",
)
model = convert_llama_model(model, local_branch, global_branch)

Evaluation

Task	4096	5120	8192	16384
Passkey retreival	100	75	48	30

Training details

The model is trained of 4 A100 80GB for approximately 55hrs.

Hyperparameters	Value
per_device_train_batch_size	1
gradient_accumulation_steps	1
epoch	3
steps	8550
learning_rate	2e-4
lr schedular type	cosine
warmup steps	1000
optimizer	adamw
fp16	True
GPU	4 A100 80GB

Acknowledgments

We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of LM-Infinite paper and the GitHub repo