|
--- |
|
library_name: transformers |
|
license: mit |
|
datasets: |
|
- OLAIR/Open-R1-Ko-SFT-v2.0 |
|
language: |
|
- ko |
|
base_model: |
|
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B |
|
--- |
|
|
|
# Model Card: OLAIR/ko-r1-7b-v2.0.3 |
|
|
|
This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations. |
|
|
|
--- |
|
|
|
## 1. Overview |
|
|
|
**Model Name:** OLAIR/ko-r1-7b-v2.0.3 |
|
**Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning |
|
**Version:** 2.0.3 |
|
|
|
This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies. |
|
|
|
|
|
## 2. Training Data |
|
|
|
The model was trained on the dataset provided by OLAIR, specifically the [Open-R1-Ko-SFT-v2.0](https://huggingface.co/datasets/OLAIR/Open-R1-Ko-SFT-v2.0) dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean. |
|
|
|
|
|
## 3. Benchmark Performance |
|
|
|
The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3: |
|
We've noticed some errors in the previous code and updated it. |
|
|
|
| Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average | |
|
|---------------------------------------|-----------|--------|---------|----------------------|---------|---------| |
|
| o1-2024-12-17 | 57.14 | 78.18 | 77.78 | 80.00 | 84.62 | 75.54 | |
|
| o3-mini-high | 57.14 | 81.82 | 77.78 | 70.00 | 69.23 | 71.19 | |
|
| o3-mini-2025-01-31 | 50.00 | 80.00 | 70.37 | 50.00 | 76.92 | 65.46 | |
|
| o1-mini-2024-09-12 | 42.86 | 56.36 | 70.37 | 60.00 | 15.38 | 48.99 | |
|
| Deepseek-R1 | 50.00 | 54.55 | 62.96 | 70.00 | 7.69 | 49.04 | |
|
| gpt-4o-2024-11-20 | 35.71 | 32.73 | 51.85 | 50.00 | 53.85 | 44.83 | |
|
| Exaone-3.5-32B-Instruct | 21.43 | 30.91 | 25.93 | 50.00 | 38.46 | 33.35 | |
|
| Qwen2.5-72B-Instruct | 35.71 | 30.91 | 51.85 | 20.00 | 23.08 | 32.31 | |
|
| **Ko-R1-7B-v2.0.3** | 7.14 | 61.82 | 40.74 | 40.00 | 0.00 | 29.94 | |
|
| Ko-R1-7B-v1 | 7.14 | 63.64 | 37.04 | 40.00 | 0.00 | 29.56 | |
|
| gpt-4o-mini-2024-07-18 | 21.43 | 29.09 | 37.04 | 50.00 | 0.00 | 27.51 | |
|
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 28.57 | 16.36 | 33.33 | 10.00 | 15.38 | 20.73 | |
|
|
|
|
|
<details> |
|
<summary>Depricated Score</summary> |
|
|
|
| Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average | |
|
|---------------------------------------|-----------|-------|---------|----------------------|---------|---------| |
|
| o1-2024-12-17 | 42.9 | 74.5 | 77.8 | 70.0 | 30.8 | 59.2 | |
|
| o3-mini-high | 35.7 | 72.7 | 70.4 | 70.0 | 23.1 | 54.4 | |
|
| o3-mini-2025-01-31 | 35.7 | 74.5 | 74.1 | 60.0 | 7.7 | 50.4 | |
|
| o1-mini-2024-09-12 | 35.7 | 54.5 | 63.0 | 60.0 | 0.0 | 42.6 | |
|
| Deepseek-R1 | 35.7 | 52.7 | 51.9 | 60.0 | 0.0 | 40.1 | |
|
| gpt-4o-2024-11-20 | 28.6 | 21.8 | 37.0 | 50.0 | 0.0 | 27.5 | |
|
| **Ko-R1-7B-v2.0.3** | **7.1** | **56.4** | **29.6** | **40.0** | **0.0** | **26.6** | |
|
| Qwen2.5-72B-Instruct | 35.7 | 29.1 | 37.0 | 30.0 | 0.0 | 26.4 | |
|
| Ko-R1-7B-v1 | 0.0 | 60.0 | 22.2 | 40.0 | 0.0 | 24.4 | |
|
| Exaone-3.5-32B-Instruct | 28.6 | 27.3 | 22.2 | 40.0 | 0.0 | 23.6 | |
|
| gpt-4o-mini-2024-07-18 | 7.1 | 29.1 | 22.2 | 50.0 | 0.0 | 21.7 | |
|
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 14.3 | 10.9 | 33.3 | 0.0 | 0.0 | 11.7 | |
|
|
|
</details> |
|
|
|
|
|
|
|
|
|
*Note:* The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts. |
|
|
|
|
|
## 4. Limitations |
|
|
|
- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it. |
|
|
|
## ETC |
|
How to Cite |
|
|
|
``` |
|
To be added |
|
``` |
|
|
|
Contact |
|
``` |
|
[email protected] |
|
``` |