Update README.md
Browse files
README.md
CHANGED
@@ -17,7 +17,7 @@ language:
|
|
17 |
This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
|
18 |
|
19 |
# Training script
|
20 |
-
Script here: https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
|
21 |
|
22 |
## Key Features
|
23 |
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
|
|
|
17 |
This model is built on top of **[Qwen/Qwen2.5-3B-Instruct](https://huggingface.co/Qwen/Qwen2.5-3B-Instruct)** and finetuned using **LoRA** (Low-Rank Adaptation) and **RLHF**-style reward optimization, leveraging **vLLM** for fast inference. It is designed to respond with a specific structure (i.e., `<reasoning> ... </reasoning>` and `<final_argument> ... </final_argument>` sections) and maximize the number of well-formed argument-objection pairs.
|
18 |
|
19 |
# Training script
|
20 |
+
Script here (example of how to do inference at the bottom): https://colab.research.google.com/drive/15DVOLcs3dopw0xPQgxaG3LVwj266XWVS?usp=sharing
|
21 |
|
22 |
## Key Features
|
23 |
- **Base Model**: Qwen/Qwen2.5-3B-Instruct
|