File size: 9,375 Bytes
22cb1c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
f434205
22cb1c2
 
aaac58c
22cb1c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed608d5
22cb1c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ed608d5
22cb1c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
bd1ac57
22cb1c2
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
---
language:
- en
- ko
license: cc-by-nc-4.0
tags:
- dnotitia
- nlp
- llm
- slm
- conversation
- chat
- reasoning
- r1
base_model:
- microsoft/phi-4
library_name: transformers
pipeline_tag: text-generation
---

# DNA-R1

<p align="center">
<img src="assets/dna-r1-logo.png" width="400" style="margin: 40px auto;">
</p>

We introduce **DNA-R1**, a specialized reasoning model optimized for Korean language based on Microsoft's Phi-4. By applying large-scale reinforcement learning (RL) using the same methodology as DeepSeek-R1, we have significantly enhanced the model's Korean reasoning capabilities. This model demonstrates deep understanding of Korean text and exhibits exceptional reasoning abilities across mathematics, coding, and general reasoning tasks.

<p align="center">
<img src="assets/dna-r1-pipeline.png" width="100%" style="margin: 40px auto;">
</p>

## Training Methodology

Our comprehensive training pipeline consists of three strategic stages:

- **Stage 1:** Initial SFT with a large Korean non-reasoning dataset (760k examples) reused from our [DNA 1.0 8B Instruct](https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct) training pipeline
- **Stage 2:** Strategic integration of Korean reasoning patterns from DeepSeek R1 using a specialized Korean reasoning dataset (300k examples)
- **Stage 3:** Advanced reinforcement learning with GRPO using a combined Korean/English reasoning dataset, with format, accuracy, and language consistency as rewards

DNA-R1 has learned reasoning patterns specifically tailored for Korean language, and demonstrates capabilities such as self-verification, reflection, and generation of long chains-of-thought (CoT). This represents a significant milestone for the AI research community in the Korean language environment.

## Model Specifications

- **Developed by:** Dnotitia Inc.
- **Supported Languages:** Korean, English
- **Model Release Date:** Mar 4, 2025
- **Number of Parameters:** 14B
- **License:** CC BY-NC 4.0

<div style="padding: 2px 8px; background-color: hsl(240, 100%, 50%, 0.1); border-radius: 5px">
  <p><strong>NOTICE (Korean):</strong></p>
  <p>๋ณธ ๋ชจ๋ธ์€ ์ƒ์—…์  ๋ชฉ์ ์œผ๋กœ ํ™œ์šฉํ•˜์‹ค ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค. ์ƒ์—…์  ์ด์šฉ์„ ์›ํ•˜์‹œ๋Š” ๊ฒฝ์šฐ, ๋””๋…ธํ‹ฐ์‹œ์•„ ํ™ˆํŽ˜์ด์ง€์˜ <a href="https://www.dnotitia.com/contact/post-form">Contact us</a>๋ฅผ ํ†ตํ•ด ๋ฌธ์˜ํ•ด ์ฃผ์‹œ๊ธฐ ๋ฐ”๋ž๋‹ˆ๋‹ค. ๊ฐ„๋‹จํ•œ ํ˜‘์˜ ์ ˆ์ฐจ๋ฅผ ๊ฑฐ์ณ ์ƒ์—…์  ํ™œ์šฉ์„ ์Šน์ธํ•ด ๋“œ๋ฆฌ๋„๋ก ํ•˜๊ฒ ์Šต๋‹ˆ๋‹ค.</p>
</div>

## Technical Details

### Multi-Stage Training Pipeline

We implemented a sophisticated training approach to enhance Phi-4's Korean reasoning capabilities:

1. **Initial Foundation (Stage 1):** Supervised Fine-Tuning using our extensive Korean non-reasoning dataset from the established [DNA 1.0 8B Instruct](https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct) training pipeline
2. **Reasoning Integration (Stage 2):** Specialized adaptation of DeepSeek R1's reasoning patterns with Korean-specific optimization through a meticulously curated dataset
3. **Advanced Refinement (Stage 3):** Reinforcement learning optimization using GRPO to perfect reasoning in both Korean and English, with comprehensive reward signals for format structure, factual accuracy, and language consistency

This methodical approach enables DNA-R1 to develop sophisticated chain-of-thought (CoT) reasoning for complex problem solving, resulting in a model finely calibrated for Korean language reasoning while maintaining robust general capabilities.

### Performance Highlights

Our Korean-specific multi-stage training pipeline significantly enhances the Phi-4 base model's understanding of Korean context, reasoning depth, and response capabilities. The model excels at:

- Generating nuanced Korean chains-of-thought (CoT)
- Performing rigorous self-verification
- Solving multi-step complex problems
- Maintaining cultural and linguistic context in reasoning
- Distinguishing between deep thinking and concise answers using the `<think>` and `<answer>` tags

## Evaluation Results

Below, we present our evaluation results for the DNA-R1 model across math, coding, science, Korean, and general-performance benchmarks.
Despite being only 14B in size, the DNA-R1 model demonstrates superior performance compared to many larger models across various benchmarks.

<table>
  <thead>
    <tr>
      <th>Benchmark</th>
      <th>Task</th>
      <th>DNA-R1 (14B)</th>
      <th>DeepSeek-R1-Distill-Qwen-14B</th>
      <th>DeepSeek-R1-Distill-Qwen-32B</th>
      <th>EXAONE-3.5-32B-Instruct</th>
      <th>QwQ-32B-Preview</th>
      <th>gpt-4o-0513</th>
      <th>o1-mini</th>
      <th>o1-preview</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>GSM8K</td>
      <td rowspan="4">Math</td>
      <td><b>92.49</b></td>
      <td>88.63</td>
      <td>82.64</td>
      <td><u>91.9</u></td>
      <td>82.41</td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
    </tr>
    <tr>
      <td>Math500</td>
      <td><u>89.4</u></td>
      <td>88.2</td>
      <td>87.4</td>
      <td>75.8</td>
      <td><b>92.2</b></td>
      <td>75.8</td>
      <td>85.6</td>
      <td>81.4</td>
    </tr>
    <tr>
      <td>AIME2024</td>
      <td>53.3</td>
      <td><u>69.7</u></td>
      <td><b>72.6</b></td>
      <td>6.67</td>
      <td>50.0</td>
      <td>8.6</td>
      <td>64.0</td>
      <td>40</td>
    </tr>
    <tr>
      <td>OlympiadBench (Math, EN)</td>
      <td><u>59.3</u></td>
      <td>56.82</td>
      <td>55.34</td>
      <td>38.58</td>
      <td><b>62.17</b></td>
      <td>-</td>
      <td>-</td>
      <td>59.2</td>
    </tr>
    <tr>
      <td>GPQA-Diamond</td>
      <td>Science/Reasoning</td>
      <td><u>61.11</u></td>
      <td>59.1</td>
      <td>58.08</td>
      <td>33.33</td>
      <td>52.5</td>
      <td>46.5</td>
      <td>60</td>
      <td><b>75.2</b></td>
    </tr>
    <tr>
      <td>LiveCodeBench</td>
      <td>Coding</td>
      <td>50.58</td>
      <td>59.88</td>
      <td><u>61.65</u></td>
      <td>19.8</td>
      <td>59.12</td>
      <td>50.48</td>
      <td><b>72.75</b></td>
      <td>59.14</td>
    </tr>
    <tr>
      <td>KMMLU-direct</td>
      <td rowspan="3">Korean</td>
      <td><u>59.9</u></td>
      <td>50.5</td>
      <td>58.62</td>
      <td>-</td>
      <td><b>62.96</b></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
    </tr>
    <tr>
      <td>KMMLU-hard</td>
      <td><u>36.65</u></td>
      <td>25.34</td>
      <td>33.67</td>
      <td>-</td>
      <td><b>37.98</b></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
    </tr>
    <tr>
      <td>KoBEST</td>
      <td><u>83.05</u></td>
      <td>74.32</td>
      <td>78.53</td>
      <td>-</td>
      <td><b>85.93</b></td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
    </tr>
    <tr>
      <td>MMLU-Pro</td>
      <td rowspan="3">General</td>
      <td><u>57.64</u></td>
      <td>50.55</td>
      <td><b>59.58</b></td>
      <td>-</td>
      <td>46.82</td>
      <td>-</td>
      <td>-</td>
      <td>-</td>
    </tr>
  </tbody>
</table>

- The *highest* *scores* are in **bold** form, and the *second*\-*highest* *scores* are <u>underlined</u>.
- All benchmarks are evaluated with [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) and [skythought-eval](https://github.com/NovaSky-AI/SkyThought/tree/main/skythought/evals).

## Quickstart

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

tokenizer = AutoTokenizer.from_pretrained('dnotitia/DNA-R1')
model = AutoModelForCausalLM.from_pretrained('dnotitia/DNA-R1', device_map='auto')
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

conversation = [
    {"role": "user", "content": """
์–ด๋ ค์„œ๋ถ€ํ„ฐ ์šฐ๋ฆฌ ์ง‘์€ ๊ฐ€๋‚œํ–ˆ์—ˆ๊ณ 
๋‚จ๋“ค ๋‹คํ•˜๋Š” ์™ธ์‹ ๋ช‡ ๋ฒˆ ํ•œ ์ ์ด ์—†์—ˆ๊ณ 
์ผํ„ฐ์— ๋‚˜๊ฐ€์‹  ์–ด๋จธ๋‹ˆ ์ง‘์— ์—†์œผ๋ฉด
์–ธ์ œ๋‚˜ ํ˜ผ์ž์„œ ๋“์—ฌ ๋จน์—ˆ๋˜ ๋ผ๋ฉด
๊ทธ๋Ÿฌ๋‹ค ๋ผ๋ฉด์ด ๋„ˆ๋ฌด ์ง€๊ฒจ์›Œ์„œ
๋ง›์žˆ๋Š” ๊ฒƒ ์ข€ ๋จน์ž๊ณ  ๋Œ€๋“ค์—ˆ์—ˆ์–ด
๊ทธ๋Ÿฌ์ž ์–ด๋จธ๋‹˜์ด ๋งˆ์ง€๋ชปํ•ด ๊บผ๋‚ด์‹ 
์ˆจ๊ฒจ๋‘์‹  ๋น„์ƒ๊ธˆ์œผ๋กœ ์‹œ์ผœ์ฃผ์‹ 
์งœ์žฅ๋ฉด ํ•˜๋‚˜์— ๋„ˆ๋ฌด๋‚˜ ํ–‰๋ณตํ–ˆ์—ˆ์–ด
ํ•˜์ง€๋งŒ ์–ด๋จธ๋‹˜์€ ์™ ์ง€ ๋“œ์‹œ์งˆ ์•Š์•˜์–ด
์–ด๋จธ๋‹˜์€ ์งœ์žฅ๋ฉด์ด ์‹ซ๋‹ค๊ณ  ํ•˜์…จ์–ด
์–ด๋จธ๋‹˜์€ ์งœ์žฅ๋ฉด์ด ์‹ซ๋‹ค๊ณ  ํ•˜์…จ์–ด
์•ผ์ด์•ผ~์•ผ ๊ทธ๋ ‡๊ฒŒ ์‚ด์•„๊ฐ€๊ณ 
๊ทธ๋ ‡๊ฒŒ ํ›„ํšŒํ•˜๊ณ  ๋ˆˆ๋ฌผ๋„ ํ˜๋ฆฌ๊ณ 
์•ผ์ด์•ผ~์•ผ ๊ทธ๋ ‡๊ฒŒ ์‚ด์•„๊ฐ€๊ณ 
๋„ˆ๋ฌด๋‚˜ ์•„ํ”„๊ณ  ํ•˜์ง€๋งŒ ๋‹ค์‹œ ์›ƒ๊ณ 
---
์นœ๊ตฌ๊ฐ€ ์“ด ์‹œ์ธ๋ฐ, ์—ฌ๊ธฐ์„œ ์นœ๊ตฌ์˜ ์–ด๋จธ๋‹ˆ๊ฐ€ ์งœ์žฅ๋ฉด์ด ์‹ซ๋‹ค๊ณ  ํ•˜์‹  ์ด์œ ๋Š”?"""},
]
inputs = tokenizer.apply_chat_template(conversation,
                                       add_generation_prompt=True,
                                       return_dict=True,
                                       return_tensors="pt").to(model.device)
_ = model.generate(**inputs, streamer=streamer)
```


## License

This model is released under CC BY-NC 4.0 license. If you have any questions or commercial usage inquiries, please [Contact us](https://www.dnotitia.com/contact/post-form).

## Citation

If you use or discuss this model in your academic research, please cite the project to help spread awareness:

```
@misc{dnar12025,
      title={DNA R1}, 
      author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
      year={2025},
      publisher={HuggingFace},
      url={https://huggingface.co/dnotitia/DNA-R1}
}
```