File size: 5,371 Bytes
bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd b418a60 8072316 430d2bd 8072316 b418a60 bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd bf6c7c4 430d2bd |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 |
---
library_name: transformers
license: mit
datasets:
- OLAIR/Open-R1-Ko-SFT-v2.0
language:
- ko
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---
# Model Card: OLAIR/ko-r1-7b-v2.0.3
This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.
---
## 1. Overview
**Model Name:** OLAIR/ko-r1-7b-v2.0.3
**Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning
**Version:** 2.0.3
This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.
## 2. Training Data
The model was trained on the dataset provided by OLAIR, specifically the [Open-R1-Ko-SFT-v2.0](https://huggingface.co/datasets/OLAIR/Open-R1-Ko-SFT-v2.0) dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.
## 3. Benchmark Performance
The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:
We've noticed some errors in the previous code and updated it.
| Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|--------|---------|----------------------|---------|---------|
| o1-2024-12-17 | 57.14 | 78.18 | 77.78 | 80.00 | 84.62 | 75.54 |
| o3-mini-high | 57.14 | 81.82 | 77.78 | 70.00 | 69.23 | 71.19 |
| o3-mini-2025-01-31 | 50.00 | 80.00 | 70.37 | 50.00 | 76.92 | 65.46 |
| o1-mini-2024-09-12 | 42.86 | 56.36 | 70.37 | 60.00 | 15.38 | 48.99 |
| Deepseek-R1 | 50.00 | 54.55 | 62.96 | 70.00 | 7.69 | 49.04 |
| gpt-4o-2024-11-20 | 35.71 | 32.73 | 51.85 | 50.00 | 53.85 | 44.83 |
| Exaone-3.5-32B-Instruct | 21.43 | 30.91 | 25.93 | 50.00 | 38.46 | 33.35 |
| Qwen2.5-72B-Instruct | 35.71 | 30.91 | 51.85 | 20.00 | 23.08 | 32.31 |
| **Ko-R1-7B-v2.0.3** | 7.14 | 61.82 | 40.74 | 40.00 | 0.00 | 29.94 |
| Ko-R1-7B-v1 | 7.14 | 63.64 | 37.04 | 40.00 | 0.00 | 29.56 |
| gpt-4o-mini-2024-07-18 | 21.43 | 29.09 | 37.04 | 50.00 | 0.00 | 27.51 |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 28.57 | 16.36 | 33.33 | 10.00 | 15.38 | 20.73 |
<details>
<summary>Depricated Score</summary>
| Model | Chemistry | Math | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|-------|---------|----------------------|---------|---------|
| o1-2024-12-17 | 42.9 | 74.5 | 77.8 | 70.0 | 30.8 | 59.2 |
| o3-mini-high | 35.7 | 72.7 | 70.4 | 70.0 | 23.1 | 54.4 |
| o3-mini-2025-01-31 | 35.7 | 74.5 | 74.1 | 60.0 | 7.7 | 50.4 |
| o1-mini-2024-09-12 | 35.7 | 54.5 | 63.0 | 60.0 | 0.0 | 42.6 |
| Deepseek-R1 | 35.7 | 52.7 | 51.9 | 60.0 | 0.0 | 40.1 |
| gpt-4o-2024-11-20 | 28.6 | 21.8 | 37.0 | 50.0 | 0.0 | 27.5 |
| **Ko-R1-7B-v2.0.3** | **7.1** | **56.4** | **29.6** | **40.0** | **0.0** | **26.6** |
| Qwen2.5-72B-Instruct | 35.7 | 29.1 | 37.0 | 30.0 | 0.0 | 26.4 |
| Ko-R1-7B-v1 | 0.0 | 60.0 | 22.2 | 40.0 | 0.0 | 24.4 |
| Exaone-3.5-32B-Instruct | 28.6 | 27.3 | 22.2 | 40.0 | 0.0 | 23.6 |
| gpt-4o-mini-2024-07-18 | 7.1 | 29.1 | 22.2 | 50.0 | 0.0 | 21.7 |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 14.3 | 10.9 | 33.3 | 0.0 | 0.0 | 11.7 |
</details>
*Note:* The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.
## 4. Limitations
- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.
## ETC
How to Cite
```
To be added
```
Contact
```
[email protected]
``` |