suayptalha
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -14,7 +14,6 @@ tags:
|
|
14 |
- sft
|
15 |
---
|
16 |
|
17 |
-
|
18 |
<!DOCTYPE html>
|
19 |
<style>
|
20 |
ebody {
|
@@ -607,6 +606,8 @@ a:hover .link-arrow {
|
|
607 |
Maestro-R1-Llama-8B is a powerful language model fine-tuned from DeepSeek-R1-Distill-Llama-8B, a distilled model based on the Llama-3 architecture. DeepSeek-R1-Distill-Llama-8B itself is derived from the Llama-3 architecture, with a distillation process from DeepSeek-R1, utilizing a large corpus of diverse data. This distillation enables the model to retain strong reasoning capabilities while maintaining a smaller parameter count.
|
608 |
<br>
|
609 |
Maestro-R1-Llama-8B builds on this foundation, further enhancing its performance through fine-tuning on the ServiceNow-AI/R1-Distill-SFT dataset. This fine-tuning step sharpens the model's ability to handle specialized tasks and improves its reasoning, problem-solving, and code generation capabilities. The combination of the distilled base model and domain-specific fine-tuning makes Maestro-R1-Llama-8B an efficient and robust model, excelling across a wide range of language tasks.
|
|
|
|
|
610 |
</div>
|
611 |
<div class="model-composition">
|
612 |
<h4>Loss Graph</h4>
|
|
|
14 |
- sft
|
15 |
---
|
16 |
|
|
|
17 |
<!DOCTYPE html>
|
18 |
<style>
|
19 |
ebody {
|
|
|
606 |
Maestro-R1-Llama-8B is a powerful language model fine-tuned from DeepSeek-R1-Distill-Llama-8B, a distilled model based on the Llama-3 architecture. DeepSeek-R1-Distill-Llama-8B itself is derived from the Llama-3 architecture, with a distillation process from DeepSeek-R1, utilizing a large corpus of diverse data. This distillation enables the model to retain strong reasoning capabilities while maintaining a smaller parameter count.
|
607 |
<br>
|
608 |
Maestro-R1-Llama-8B builds on this foundation, further enhancing its performance through fine-tuning on the ServiceNow-AI/R1-Distill-SFT dataset. This fine-tuning step sharpens the model's ability to handle specialized tasks and improves its reasoning, problem-solving, and code generation capabilities. The combination of the distilled base model and domain-specific fine-tuning makes Maestro-R1-Llama-8B an efficient and robust model, excelling across a wide range of language tasks.
|
609 |
+
<br>
|
610 |
+
DeepSeek-R1 Paper Link: https://arxiv.org/abs/2501.12948
|
611 |
</div>
|
612 |
<div class="model-composition">
|
613 |
<h4>Loss Graph</h4>
|