This is an instruction following model (based on Qwen2.5-7B base) optimized for Russian language.

The model was trained in two phases: SFT (training data composition is similar to kolibri-mistral-0427) and RLHF.

Current RLHF pipeline leads to degradation on IFEval, but the overall 'vibe' of the model improves significantly. I am currently investigating the causes of this degradation and exploring methods to further enhance instruction-following capabilities.

The model uses ChatML template. Adding a system prompt will likely improve the model's performance on your tasks (experiment with it).

Instruction following evals

The model was tested using the following benchmarks:

Eval name Strict Value Loose Value
Avg. 43.00 49.17
ifeval-prompt-level 38.63 46.21
ifeval-instruction-level 51.20 57.5
ru-ifeval-prompt-level 35.30 40.48
ru-ifeval-instruction-level 46.88 52.52

Russian LLM Arena (proxy eval via JINA)

The table below approximates Russian LLM Arena scores using the JINA Judge model. Take it with a grain of salt.

Model Name Score 95% CI Avg Tokens
gpt-4-1106-preview 82.8 (-2.8, 2.6) 541
gpt-4o-mini 75.3 (-2.2, 2.8) 448
qwen-2.5-72b-it 73.1 (-3.0, 3.1) 557
gemma-2-9b-it-sppo-iter3 70.6 (-3.7, 3.0) 509
gemma-2-27b-it 68.7 (-2.9, 3.8) 472
t-lite-instruct-0.1 67.5 (-4.2, 2.7) 810
gemma-2-9b-it 67.0 (-3.0, 3.8) 459
suzume-llama-3-8B-multilingual-orpo-borda-half 62.4 (-3.0, 3.3) 682
glm-4-9b-chat 61.5 (-3.9, 3.3) 568
phi-3-medium-4k-instruct 60.4 (-3.8, 3.6) 566
sfr-iterative-dpo-llama-3-8b-r 57.2 (-3.8, 4.0) 516
kolibri-qwen2.5-7b-060225-rlhf-1 55.4 (-3.1, 4.4) 383
c4ai-command-r-v01 55.0 (-3.7, 4.4) 529
suzume-llama-3-8b-multilingual 51.9 (-3.1, 3.4) 641
mistral-nemo-instruct-2407 51.9 (-3.0, 3.0) 403
yandex_gpt_pro 50.3 (-3.5, 3.0) 345
gpt-3.5-turbo-0125 50.0 (0.0, 0.0) 220
hermes-2-theta-llama-3-8b 49.3 (-3.2, 3.7) 485
starling-lm-7b-beta 48.3 (-3.7, 3.9) 629
llama-3-8b-saiga-suzume-ties 47.9 (-3.9, 5.0) 763
llama-3-smaug-8b 47.6 (-4.3, 2.9) 524
vikhr-it-5.4-fp16-orpo-v2 46.8 (-2.4, 2.2) 379
aya-23-8b 46.1 (-3.3, 3.6) 554
saiga_llama3_8b_v6 44.8 (-2.9, 3.2) 471
qwen2-7b-instruct 43.6 (-3.5, 3.0) 340
vikhr-it-5.2-fp16-cp 43.6 (-3.6, 3.3) 543
openchat-3.5-0106 42.8 (-2.5, 3.8) 492
kolibri-mistral-0427-upd 42.3 (-4.1, 4.0) 551
paralex-llama-3-8b-sft 41.8 (-3.7, 3.9) 688
llama-3-instruct-8b-sppo-iter3 41.7 (-4.0, 3.6) 502
gpt-3.5-turbo-1106 41.5 (-2.7, 2.5) 191
mistral-7b-instruct-v0.3 41.1 (-4.1, 2.9) 469
gigachat_pro 40.9 (-3.2, 2.8) 294
openchat-3.6-8b-20240522 39.1 (-2.9, 3.8) 428
vikhr-it-5.3-fp16-32k 38.8 (-3.2, 3.3) 519
hermes-2-pro-llama-3-8b 38.4 (-3.9, 3.9) 463
kolibri-vikhr-mistral-0427 34.5 (-2.9, 3.1) 489
vikhr-it-5.3-fp16 33.5 (-3.0, 3.8) 523
llama-3-instruct-8b-simpo 32.7 (-3.2, 2.7) 417
meta-llama-3-8b-instruct 32.1 (-3.6, 4.2) 450
neural-chat-7b-v3-3 25.9 (-3.1, 3.2) 927
gigachat_lite 25.4 (-3.5, 2.7) 276
snorkel-mistral-pairrm-dpo 10.3 (-2.3, 2.6) 773
storm-7b 3.7 (-1.9, 1.7) 419
Downloads last month
11
Safetensors
Model size
7.61B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1

Base model

Qwen/Qwen2.5-7B
Finetuned
(246)
this model
Quantizations
1 model

Collection including kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1