Introducing GenZ Infinite
The model is a finetuned version of Genz-13B-v2 with a context size of 16K. The model architecture is updated to have lamda attention from the LM-Infinite paper which gives the model capability of 120K+ sequence length without affecting the preplexity
Generate responses
Use the generate.py file from the github repo
python generate.py --base_model budecosystem/genz-13b-infinite
You can integrate the model in your code my loading convert_llama_model function.
import torch
from transformers import GenerationConfig, AutoModelForCausalLM, AutoTokenizer
from model.llama import convert_llama_model
local_branch = 2048
global_branch = 10
limit_distance = 2048
model = AutoModelForCausalLM.from_pretrained(
"budecosystem/genz-13b-infinite",
torch_dtype=torch.float16,
device_map="auto",
)
model = convert_llama_model(model, local_branch, global_branch)
Evaluation
Task | 4096 | 5120 | 8192 | 16384 |
---|---|---|---|---|
Passkey retreival | 100 | 75 | 48 | 30 |
Training details
The model is trained of 4 A100 80GB for approximately 55hrs.
Hyperparameters | Value |
---|---|
per_device_train_batch_size | 1 |
gradient_accumulation_steps | 1 |
epoch | 3 |
steps | 8550 |
learning_rate | 2e-4 |
lr schedular type | cosine |
warmup steps | 1000 |
optimizer | adamw |
fp16 | True |
GPU | 4 A100 80GB |
Acknowledgments
We'd like to thank the open-source community and the researchers whose foundational work laid the path to this model. Special shoutout to the authors of LM-Infinite paper and the GitHub repo
- Downloads last month
- 1,424
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.