Use in the same way as IlyaGusev/saiga2_7b_lora.

WARNING! Load tokenizer as AutoTokenizer.from_pretrained(model_path, use_fast=True)

Up to 60% faster generation and 35% training (on identical russian text sequences!) with HF because of different tokenizer.

Colab: https://colab.research.google.com/drive/109ZhEB6STy-0jO-Z_4ttkWr1jg_FCTRW?usp=sharing

Paper: Tikhomirov M., Chernyshev D. Impact of Tokenization on LLaMa Russian Adaptation //arXiv preprint arXiv:2312.02598. – 2023.

Model description

Instruction version (Saiga datasets) of Russian adaptation of LLaMa-2-7B by replacing the tokenizer. Paper: Tikhomirov M.M., Chernyshev D.I., Impact of Tokenization on LLaMa Russian Adaptation (will be soon)

Downloads last month
1,160
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.