kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1

This is an instruction following model (based on Qwen2.5-7B base) optimized for Russian language.

The model was trained in two phases: SFT (training data composition is similar to kolibri-mistral-0427) and RLHF.

Current RLHF pipeline leads to degradation on IFEval, but the overall 'vibe' of the model improves significantly. I am currently investigating the causes of this degradation and exploring methods to further enhance instruction-following capabilities.

The model uses ChatML template. Adding a system prompt will likely improve the model's performance on your tasks (experiment with it).

Instruction following evals

The model was tested using the following benchmarks:

Eval name	Strict Value	Loose Value
Avg.	43.00	49.17
ifeval-prompt-level	38.63	46.21
ifeval-instruction-level	51.20	57.5
ru-ifeval-prompt-level	35.30	40.48
ru-ifeval-instruction-level	46.88	52.52

Russian LLM Arena (proxy eval via JINA)

The table below approximates Russian LLM Arena scores using the JINA Judge model. Take it with a grain of salt.

Model Name	Score	95% CI	Avg Tokens
gpt-4-1106-preview	82.8	(-2.8, 2.6)	541
gpt-4o-mini	75.3	(-2.2, 2.8)	448
qwen-2.5-72b-it	73.1	(-3.0, 3.1)	557
gemma-2-9b-it-sppo-iter3	70.6	(-3.7, 3.0)	509
gemma-2-27b-it	68.7	(-2.9, 3.8)	472
t-lite-instruct-0.1	67.5	(-4.2, 2.7)	810
gemma-2-9b-it	67.0	(-3.0, 3.8)	459
suzume-llama-3-8B-multilingual-orpo-borda-half	62.4	(-3.0, 3.3)	682
glm-4-9b-chat	61.5	(-3.9, 3.3)	568
phi-3-medium-4k-instruct	60.4	(-3.8, 3.6)	566
sfr-iterative-dpo-llama-3-8b-r	57.2	(-3.8, 4.0)	516
kolibri-qwen2.5-7b-060225-rlhf-1	55.4	(-3.1, 4.4)	383
c4ai-command-r-v01	55.0	(-3.7, 4.4)	529
suzume-llama-3-8b-multilingual	51.9	(-3.1, 3.4)	641
mistral-nemo-instruct-2407	51.9	(-3.0, 3.0)	403
yandex_gpt_pro	50.3	(-3.5, 3.0)	345
gpt-3.5-turbo-0125	50.0	(0.0, 0.0)	220
hermes-2-theta-llama-3-8b	49.3	(-3.2, 3.7)	485
starling-lm-7b-beta	48.3	(-3.7, 3.9)	629
llama-3-8b-saiga-suzume-ties	47.9	(-3.9, 5.0)	763
llama-3-smaug-8b	47.6	(-4.3, 2.9)	524
vikhr-it-5.4-fp16-orpo-v2	46.8	(-2.4, 2.2)	379
aya-23-8b	46.1	(-3.3, 3.6)	554
saiga_llama3_8b_v6	44.8	(-2.9, 3.2)	471
qwen2-7b-instruct	43.6	(-3.5, 3.0)	340
vikhr-it-5.2-fp16-cp	43.6	(-3.6, 3.3)	543
openchat-3.5-0106	42.8	(-2.5, 3.8)	492
kolibri-mistral-0427-upd	42.3	(-4.1, 4.0)	551
paralex-llama-3-8b-sft	41.8	(-3.7, 3.9)	688
llama-3-instruct-8b-sppo-iter3	41.7	(-4.0, 3.6)	502
gpt-3.5-turbo-1106	41.5	(-2.7, 2.5)	191
mistral-7b-instruct-v0.3	41.1	(-4.1, 2.9)	469
gigachat_pro	40.9	(-3.2, 2.8)	294
openchat-3.6-8b-20240522	39.1	(-2.9, 3.8)	428
vikhr-it-5.3-fp16-32k	38.8	(-3.2, 3.3)	519
hermes-2-pro-llama-3-8b	38.4	(-3.9, 3.9)	463
kolibri-vikhr-mistral-0427	34.5	(-2.9, 3.1)	489
vikhr-it-5.3-fp16	33.5	(-3.0, 3.8)	523
llama-3-instruct-8b-simpo	32.7	(-3.2, 2.7)	417
meta-llama-3-8b-instruct	32.1	(-3.6, 4.2)	450
neural-chat-7b-v3-3	25.9	(-3.1, 3.2)	927
gigachat_lite	25.4	(-3.5, 2.7)	276
snorkel-mistral-pairrm-dpo	10.3	(-2.3, 2.6)	773
storm-7b	3.7	(-1.9, 1.7)	419

kaleinaNyan
/

kolibri-qwen2.5-7b-060225-rlhf-1

Instruction following evals

Russian LLM Arena (proxy eval via JINA)

Model tree for kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1

Collection including kaleinaNyan/kolibri-qwen2.5-7b-060225-rlhf-1

Kolibri