File size: 5,371 Bytes

bf6c7c4
 
430d2bd
 
 
 
 
 
 
bf6c7c4
 
430d2bd
bf6c7c4
430d2bd
bf6c7c4
430d2bd
bf6c7c4
430d2bd
bf6c7c4
430d2bd
 
 
bf6c7c4
430d2bd
bf6c7c4
 
430d2bd
bf6c7c4
430d2bd
bf6c7c4
 
430d2bd
bf6c7c4
430d2bd
b418a60
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8072316
 
430d2bd
 
 
 
 
 
 
 
 
 
 
 
 
 
8072316
 
b418a60
 
 
bf6c7c4
430d2bd
bf6c7c4
 
430d2bd
bf6c7c4
430d2bd
bf6c7c4
430d2bd
 
bf6c7c4
430d2bd
 
 
bf6c7c4
430d2bd

---
library_name: transformers
license: mit
datasets:
- OLAIR/Open-R1-Ko-SFT-v2.0
language:
- ko
base_model:
- deepseek-ai/DeepSeek-R1-Distill-Qwen-7B
---

# Model Card: OLAIR/ko-r1-7b-v2.0.3

This document describes the OLAIR/ko-r1-7b-v2.0.3 model, including its training data, intended use, performance benchmarks, limitations, and ethical considerations.

---

## 1. Overview

**Model Name:** OLAIR/ko-r1-7b-v2.0.3  
**Model Type:** Large Language Model (LLM) for Korean language understanding and reasoning  
**Version:** 2.0.3

This model is designed to provide Korean language capabilities with a focus on reasoning tasks. It is the second version in its series, building upon previous iterations with improvements in training data and fine-tuning methodologies.


## 2. Training Data

The model was trained on the dataset provided by OLAIR, specifically the [Open-R1-Ko-SFT-v2.0](https://huggingface.co/datasets/OLAIR/Open-R1-Ko-SFT-v2.0) dataset. This dataset includes a curated collection of Korean language data, optimized for supervised fine-tuning (SFT) to enhance reasoning and natural language understanding capabilities in Korean.


## 3. Benchmark Performance

The model's performance has been evaluated using the HAE-RAE Reasoning Challenge (HRC), which measures reasoning abilities across various domains. Below are the benchmark results for several models, including OLAIR/ko-r1-7b-v2.0.3:
We've noticed some errors in the previous code and updated it.

| Model                                 | Chemistry | Math   | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|--------|---------|----------------------|---------|---------|
| o1-2024-12-17                         | 57.14     | 78.18  | 77.78   | 80.00                | 84.62   | 75.54   |
| o3-mini-high                          | 57.14     | 81.82  | 77.78   | 70.00                | 69.23   | 71.19   |
| o3-mini-2025-01-31                     | 50.00     | 80.00  | 70.37   | 50.00                | 76.92   | 65.46   |
| o1-mini-2024-09-12                     | 42.86     | 56.36  | 70.37   | 60.00                | 15.38   | 48.99   |
| Deepseek-R1                           | 50.00     | 54.55  | 62.96   | 70.00                | 7.69    | 49.04   |
| gpt-4o-2024-11-20                      | 35.71     | 32.73  | 51.85   | 50.00                | 53.85   | 44.83   |
| Exaone-3.5-32B-Instruct               | 21.43     | 30.91  | 25.93   | 50.00                | 38.46   | 33.35   |
| Qwen2.5-72B-Instruct                  | 35.71     | 30.91  | 51.85   | 20.00                | 23.08   | 32.31   |
| **Ko-R1-7B-v2.0.3**                   | 7.14      | 61.82  | 40.74   | 40.00                | 0.00    | 29.94   |
| Ko-R1-7B-v1                           | 7.14      | 63.64  | 37.04   | 40.00                | 0.00    | 29.56   |
| gpt-4o-mini-2024-07-18                 | 21.43     | 29.09  | 37.04   | 50.00                | 0.00    | 27.51   |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 28.57     | 16.36  | 33.33   | 10.00                | 15.38   | 20.73   |


<details>
<summary>Depricated Score</summary>  
  
| Model                                 | Chemistry | Math  | Physics | Physics Word Puzzles | Puzzles | Average |
|---------------------------------------|-----------|-------|---------|----------------------|---------|---------|
| o1-2024-12-17                         | 42.9      | 74.5  | 77.8    | 70.0                 | 30.8    | 59.2    |
| o3-mini-high                          | 35.7      | 72.7  | 70.4    | 70.0                 | 23.1    | 54.4    |
| o3-mini-2025-01-31                     | 35.7      | 74.5  | 74.1    | 60.0                 | 7.7     | 50.4    |
| o1-mini-2024-09-12                     | 35.7      | 54.5  | 63.0    | 60.0                 | 0.0     | 42.6    |
| Deepseek-R1                           | 35.7      | 52.7  | 51.9    | 60.0                 | 0.0     | 40.1    |
| gpt-4o-2024-11-20                      | 28.6      | 21.8  | 37.0    | 50.0                 | 0.0     | 27.5    |
| **Ko-R1-7B-v2.0.3**                   | **7.1**   | **56.4**  | **29.6**    | **40.0**                 | **0.0**     | **26.6**    |
| Qwen2.5-72B-Instruct                  | 35.7      | 29.1  | 37.0    | 30.0                 | 0.0     | 26.4    |
| Ko-R1-7B-v1                           | 0.0       | 60.0  | 22.2    | 40.0                 | 0.0     | 24.4    |
| Exaone-3.5-32B-Instruct               | 28.6      | 27.3  | 22.2    | 40.0                 | 0.0     | 23.6    |
| gpt-4o-mini-2024-07-18                 | 7.1       | 29.1  | 22.2    | 50.0                 | 0.0     | 21.7    |
| UNIVA-Bllossom_DeepSeek-llama3.1-Bllossom-8B | 14.3      | 10.9  | 33.3    | 0.0                  | 0.0     | 11.7    |
  
</details> 




*Note:* The above table reflects performance across multiple reasoning domains. The metrics indicate that while OLAIR/ko-r1-7b-v2.0.3 shows competitive performance in certain areas (e.g., Math), there remain challenges, particularly in Chemistry and Physics-related tasks, compared to some higher-performing counterparts.


## 4. Limitations

- The model is still vulnerable to Korean-related inputs, leading to endless loops of thinking. We are working to fix it.

## ETC
How to Cite

```
To be added
```

Contact 
```
[email protected]
```