File size: 9,375 Bytes
22cb1c2 f434205 22cb1c2 aaac58c 22cb1c2 ed608d5 22cb1c2 ed608d5 22cb1c2 bd1ac57 22cb1c2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 |
---
language:
- en
- ko
license: cc-by-nc-4.0
tags:
- dnotitia
- nlp
- llm
- slm
- conversation
- chat
- reasoning
- r1
base_model:
- microsoft/phi-4
library_name: transformers
pipeline_tag: text-generation
---
# DNA-R1
<p align="center">
<img src="assets/dna-r1-logo.png" width="400" style="margin: 40px auto;">
</p>
We introduce **DNA-R1**, a specialized reasoning model optimized for Korean language based on Microsoft's Phi-4. By applying large-scale reinforcement learning (RL) using the same methodology as DeepSeek-R1, we have significantly enhanced the model's Korean reasoning capabilities. This model demonstrates deep understanding of Korean text and exhibits exceptional reasoning abilities across mathematics, coding, and general reasoning tasks.
<p align="center">
<img src="assets/dna-r1-pipeline.png" width="100%" style="margin: 40px auto;">
</p>
## Training Methodology
Our comprehensive training pipeline consists of three strategic stages:
- **Stage 1:** Initial SFT with a large Korean non-reasoning dataset (760k examples) reused from our [DNA 1.0 8B Instruct](https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct) training pipeline
- **Stage 2:** Strategic integration of Korean reasoning patterns from DeepSeek R1 using a specialized Korean reasoning dataset (300k examples)
- **Stage 3:** Advanced reinforcement learning with GRPO using a combined Korean/English reasoning dataset, with format, accuracy, and language consistency as rewards
DNA-R1 has learned reasoning patterns specifically tailored for Korean language, and demonstrates capabilities such as self-verification, reflection, and generation of long chains-of-thought (CoT). This represents a significant milestone for the AI research community in the Korean language environment.
## Model Specifications
- **Developed by:** Dnotitia Inc.
- **Supported Languages:** Korean, English
- **Model Release Date:** Mar 4, 2025
- **Number of Parameters:** 14B
- **License:** CC BY-NC 4.0
<div style="padding: 2px 8px; background-color: hsl(240, 100%, 50%, 0.1); border-radius: 5px">
<p><strong>NOTICE (Korean):</strong></p>
<p>๋ณธ ๋ชจ๋ธ์ ์์
์ ๋ชฉ์ ์ผ๋ก ํ์ฉํ์ค ์ ์์ต๋๋ค. ์์
์ ์ด์ฉ์ ์ํ์๋ ๊ฒฝ์ฐ, ๋๋
ธํฐ์์ ํํ์ด์ง์ <a href="https://www.dnotitia.com/contact/post-form">Contact us</a>๋ฅผ ํตํด ๋ฌธ์ํด ์ฃผ์๊ธฐ ๋ฐ๋๋๋ค. ๊ฐ๋จํ ํ์ ์ ์ฐจ๋ฅผ ๊ฑฐ์ณ ์์
์ ํ์ฉ์ ์น์ธํด ๋๋ฆฌ๋๋ก ํ๊ฒ ์ต๋๋ค.</p>
</div>
## Technical Details
### Multi-Stage Training Pipeline
We implemented a sophisticated training approach to enhance Phi-4's Korean reasoning capabilities:
1. **Initial Foundation (Stage 1):** Supervised Fine-Tuning using our extensive Korean non-reasoning dataset from the established [DNA 1.0 8B Instruct](https://huggingface.co/dnotitia/Llama-DNA-1.0-8B-Instruct) training pipeline
2. **Reasoning Integration (Stage 2):** Specialized adaptation of DeepSeek R1's reasoning patterns with Korean-specific optimization through a meticulously curated dataset
3. **Advanced Refinement (Stage 3):** Reinforcement learning optimization using GRPO to perfect reasoning in both Korean and English, with comprehensive reward signals for format structure, factual accuracy, and language consistency
This methodical approach enables DNA-R1 to develop sophisticated chain-of-thought (CoT) reasoning for complex problem solving, resulting in a model finely calibrated for Korean language reasoning while maintaining robust general capabilities.
### Performance Highlights
Our Korean-specific multi-stage training pipeline significantly enhances the Phi-4 base model's understanding of Korean context, reasoning depth, and response capabilities. The model excels at:
- Generating nuanced Korean chains-of-thought (CoT)
- Performing rigorous self-verification
- Solving multi-step complex problems
- Maintaining cultural and linguistic context in reasoning
- Distinguishing between deep thinking and concise answers using the `<think>` and `<answer>` tags
## Evaluation Results
Below, we present our evaluation results for the DNA-R1 model across math, coding, science, Korean, and general-performance benchmarks.
Despite being only 14B in size, the DNA-R1 model demonstrates superior performance compared to many larger models across various benchmarks.
<table>
<thead>
<tr>
<th>Benchmark</th>
<th>Task</th>
<th>DNA-R1 (14B)</th>
<th>DeepSeek-R1-Distill-Qwen-14B</th>
<th>DeepSeek-R1-Distill-Qwen-32B</th>
<th>EXAONE-3.5-32B-Instruct</th>
<th>QwQ-32B-Preview</th>
<th>gpt-4o-0513</th>
<th>o1-mini</th>
<th>o1-preview</th>
</tr>
</thead>
<tbody>
<tr>
<td>GSM8K</td>
<td rowspan="4">Math</td>
<td><b>92.49</b></td>
<td>88.63</td>
<td>82.64</td>
<td><u>91.9</u></td>
<td>82.41</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>Math500</td>
<td><u>89.4</u></td>
<td>88.2</td>
<td>87.4</td>
<td>75.8</td>
<td><b>92.2</b></td>
<td>75.8</td>
<td>85.6</td>
<td>81.4</td>
</tr>
<tr>
<td>AIME2024</td>
<td>53.3</td>
<td><u>69.7</u></td>
<td><b>72.6</b></td>
<td>6.67</td>
<td>50.0</td>
<td>8.6</td>
<td>64.0</td>
<td>40</td>
</tr>
<tr>
<td>OlympiadBench (Math, EN)</td>
<td><u>59.3</u></td>
<td>56.82</td>
<td>55.34</td>
<td>38.58</td>
<td><b>62.17</b></td>
<td>-</td>
<td>-</td>
<td>59.2</td>
</tr>
<tr>
<td>GPQA-Diamond</td>
<td>Science/Reasoning</td>
<td><u>61.11</u></td>
<td>59.1</td>
<td>58.08</td>
<td>33.33</td>
<td>52.5</td>
<td>46.5</td>
<td>60</td>
<td><b>75.2</b></td>
</tr>
<tr>
<td>LiveCodeBench</td>
<td>Coding</td>
<td>50.58</td>
<td>59.88</td>
<td><u>61.65</u></td>
<td>19.8</td>
<td>59.12</td>
<td>50.48</td>
<td><b>72.75</b></td>
<td>59.14</td>
</tr>
<tr>
<td>KMMLU-direct</td>
<td rowspan="3">Korean</td>
<td><u>59.9</u></td>
<td>50.5</td>
<td>58.62</td>
<td>-</td>
<td><b>62.96</b></td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>KMMLU-hard</td>
<td><u>36.65</u></td>
<td>25.34</td>
<td>33.67</td>
<td>-</td>
<td><b>37.98</b></td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>KoBEST</td>
<td><u>83.05</u></td>
<td>74.32</td>
<td>78.53</td>
<td>-</td>
<td><b>85.93</b></td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
<tr>
<td>MMLU-Pro</td>
<td rowspan="3">General</td>
<td><u>57.64</u></td>
<td>50.55</td>
<td><b>59.58</b></td>
<td>-</td>
<td>46.82</td>
<td>-</td>
<td>-</td>
<td>-</td>
</tr>
</tbody>
</table>
- The *highest* *scores* are in **bold** form, and the *second*\-*highest* *scores* are <u>underlined</u>.
- All benchmarks are evaluated with [lm-eval](https://github.com/EleutherAI/lm-evaluation-harness) and [skythought-eval](https://github.com/NovaSky-AI/SkyThought/tree/main/skythought/evals).
## Quickstart
```python
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer
tokenizer = AutoTokenizer.from_pretrained('dnotitia/DNA-R1')
model = AutoModelForCausalLM.from_pretrained('dnotitia/DNA-R1', device_map='auto')
streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)
conversation = [
{"role": "user", "content": """
์ด๋ ค์๋ถํฐ ์ฐ๋ฆฌ ์ง์ ๊ฐ๋ํ์๊ณ
๋จ๋ค ๋คํ๋ ์ธ์ ๋ช ๋ฒ ํ ์ ์ด ์์๊ณ
์ผํฐ์ ๋๊ฐ์ ์ด๋จธ๋ ์ง์ ์์ผ๋ฉด
์ธ์ ๋ ํผ์์ ๋์ฌ ๋จน์๋ ๋ผ๋ฉด
๊ทธ๋ฌ๋ค ๋ผ๋ฉด์ด ๋๋ฌด ์ง๊ฒจ์์
๋ง์๋ ๊ฒ ์ข ๋จน์๊ณ ๋๋ค์์์ด
๊ทธ๋ฌ์ ์ด๋จธ๋์ด ๋ง์ง๋ชปํด ๊บผ๋ด์
์จ๊ฒจ๋์ ๋น์๊ธ์ผ๋ก ์์ผ์ฃผ์
์ง์ฅ๋ฉด ํ๋์ ๋๋ฌด๋ ํ๋ณตํ์์ด
ํ์ง๋ง ์ด๋จธ๋์ ์ ์ง ๋์์ง ์์์ด
์ด๋จธ๋์ ์ง์ฅ๋ฉด์ด ์ซ๋ค๊ณ ํ์
จ์ด
์ด๋จธ๋์ ์ง์ฅ๋ฉด์ด ์ซ๋ค๊ณ ํ์
จ์ด
์ผ์ด์ผ~์ผ ๊ทธ๋ ๊ฒ ์ด์๊ฐ๊ณ
๊ทธ๋ ๊ฒ ํํํ๊ณ ๋๋ฌผ๋ ํ๋ฆฌ๊ณ
์ผ์ด์ผ~์ผ ๊ทธ๋ ๊ฒ ์ด์๊ฐ๊ณ
๋๋ฌด๋ ์ํ๊ณ ํ์ง๋ง ๋ค์ ์๊ณ
---
์น๊ตฌ๊ฐ ์ด ์์ธ๋ฐ, ์ฌ๊ธฐ์ ์น๊ตฌ์ ์ด๋จธ๋๊ฐ ์ง์ฅ๋ฉด์ด ์ซ๋ค๊ณ ํ์ ์ด์ ๋?"""},
]
inputs = tokenizer.apply_chat_template(conversation,
add_generation_prompt=True,
return_dict=True,
return_tensors="pt").to(model.device)
_ = model.generate(**inputs, streamer=streamer)
```
## License
This model is released under CC BY-NC 4.0 license. If you have any questions or commercial usage inquiries, please [Contact us](https://www.dnotitia.com/contact/post-form).
## Citation
If you use or discuss this model in your academic research, please cite the project to help spread awareness:
```
@misc{dnar12025,
title={DNA R1},
author={Jungyup Lee and Jemin Kim and Sang Park and SeungJae Lee},
year={2025},
publisher={HuggingFace},
url={https://huggingface.co/dnotitia/DNA-R1}
}
``` |