Update README.md
Browse files
README.md
CHANGED
@@ -15,7 +15,9 @@ tags:
|
|
15 |

|
16 |
# **Magellanic-Llama-70B-r999**
|
17 |
|
18 |
-
Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy.
|
|
|
|
|
19 |
|
20 |
# **Use with Transformers**
|
21 |
|
|
|
15 |

|
16 |
# **Magellanic-Llama-70B-r999**
|
17 |
|
18 |
+
Magellanic-Llama-70B-r999 is a Llama-based model fine-tuned from the DeepSeek R1 Distill 70B FT Llama, a model trained via large-scale reinforcement learning (RL) without supervised fine-tuning (SFT) as a preliminary step. This model has demonstrated remarkable performance in reasoning. With RL, it has been trained on nearly 1 million entries of data, leading to increased improvements in safety and ensuring retention of factual accuracy.
|
19 |
+
|
20 |
+
Additionally, it addresses issues such as endless repetition, poor readability, and language mixing. This approach allows the model to explore chain-of-thought (CoT) reasoning for solving complex problems, improving reasoning patterns, and aligning with human preferences. Furthermore, two SFT stages serve as the seed for the model's reasoning and non-reasoning capabilities.
|
21 |
|
22 |
# **Use with Transformers**
|
23 |
|