---
library_name: transformers
license: mit
datasets:
- OLAIR/Open-R1-Ko-SFT-v2.0
language:
- ko
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---

# Model Card: OLAIR/ko-r1-7b-v2.0.3

This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.

---

## 1. Overview

**Model Name:** OLAIR/ko-r1-7b-v2.0.3  
**Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning  
**Version:** 2.0.3

This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.


## 2. Training Data

The model was trained on the dataset provided by OLAIR, specifically the [Open-R1-Ko-SFT-v2.0](https://huggingface.co/datasets/OLAIR/Open-R1-Ko-SFT-v2.0) dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.


## 3. Benchmark Performance

The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:
We've noticed some errors in the previous code and updated it.

| Model                                 | Chemistry | Math   | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|--------|---------|----------------------|---------|---------|
| o1-2024-12-17                         | 57.14     | 78.18  | 77.78   | 80.00                | 84.62   | 75.54   |
| o3-mini-high                          | 57.14     | 81.82  | 77.78   | 70.00                | 69.23   | 71.19   |
| o3-mini-2025-01-31                     | 50.00     | 80.00  | 70.37   | 50.00                | 76.92   | 65.46   |
| o1-mini-2024-09-12                     | 42.86     | 56.36  | 70.37   | 60.00                | 15.38   | 48.99   |
| Deepseek-R1                           | 50.00     | 54.55  | 62.96   | 70.00                | 7.69    | 49.04   |
| gpt-4o-2024-11-20                      | 35.71     | 32.73  | 51.85   | 50.00                | 53.85   | 44.83   |
| Exaone-3.5-32B-Instruct               | 21.43     | 30.91  | 25.93   | 50.00                | 38.46   | 33.35   |
| Qwen2.5-72B-Instruct                  | 35.71     | 30.91  | 51.85   | 20.00                | 23.08   | 32.31   |
| **Ko-R1-7B-v2.0.3**                   | 7.14      | 61.82  | 40.74   | 40.00                | 0.00    | 29.94   |
| Ko-R1-7B-v1                           | 7.14      | 63.64  | 37.04   | 40.00                | 0.00    | 29.56   |
| gpt-4o-mini-2024-07-18                 | 21.43     | 29.09  | 37.04   | 50.00                | 0.00    | 27.51   |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 28.57     | 16.36  | 33.33   | 10.00                | 15.38   | 20.73   |


<details>
<summary>Depricated Score</summary>  
  
| Model                                 | Chemistry | Math  | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|-------|---------|----------------------|---------|---------|
| o1-2024-12-17                         | 42.9      | 74.5  | 77.8    | 70.0                 | 30.8    | 59.2    |
| o3-mini-high                          | 35.7      | 72.7  | 70.4    | 70.0                 | 23.1    | 54.4    |
| o3-mini-2025-01-31                     | 35.7      | 74.5  | 74.1    | 60.0                 | 7.7     | 50.4    |
| o1-mini-2024-09-12                     | 35.7      | 54.5  | 63.0    | 60.0                 | 0.0     | 42.6    |
| Deepseek-R1                           | 35.7      | 52.7  | 51.9    | 60.0                 | 0.0     | 40.1    |
| gpt-4o-2024-11-20                      | 28.6      | 21.8  | 37.0    | 50.0                 | 0.0     | 27.5    |
| **Ko-R1-7B-v2.0.3**                   | **7.1**   | **56.4**  | **29.6**    | **40.0**                 | **0.0**     | **26.6**    |
| Qwen2.5-72B-Instruct                  | 35.7      | 29.1  | 37.0    | 30.0                 | 0.0     | 26.4    |
| Ko-R1-7B-v1                           | 0.0       | 60.0  | 22.2    | 40.0                 | 0.0     | 24.4    |
| Exaone-3.5-32B-Instruct               | 28.6      | 27.3  | 22.2    | 40.0                 | 0.0     | 23.6    |
| gpt-4o-mini-2024-07-18                 | 7.1       | 29.1  | 22.2    | 50.0                 | 0.0     | 21.7    |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 14.3      | 10.9  | 33.3    | 0.0                  | 0.0     | 11.7    |
  
</details> 


*Note:* The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.


## 4. Limitations

- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.

## ETC
How to Cite

```
To be added
```

Contact 
```
spthsrbwls123@yonsei.ac.kr
```