Pinkstack commited on
Commit
3f76e4c
·
verified ·
1 Parent(s): 630dd1c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +68 -6
README.md CHANGED
@@ -1,22 +1,84 @@
1
  ---
2
- base_model: Pinkstack/Superthoughts-lite-1.8B-sft
 
3
  tags:
4
  - text-generation-inference
5
  - transformers
6
  - unsloth
7
  - llama
8
  - gguf
 
 
 
 
9
  license: apache-2.0
10
  language:
11
  - en
 
 
 
12
  ---
13
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
14
  # Uploaded model
15
 
16
  - **Developed by:** Pinkstack
17
  - **License:** apache-2.0
18
- - **Finetuned from model :** Pinkstack/Superthoughts-lite-1.8B-sft
19
-
20
- This llama model was trained 2x faster with [Unsloth](https://github.com/unslothai/unsloth) and Huggingface's TRL library.
21
-
22
- [<img src="https://raw.githubusercontent.com/unslothai/unsloth/main/images/unsloth%20made%20with%20love.png" width="200"/>](https://github.com/unslothai/unsloth)
 
1
  ---
2
+ base_model:
3
+ - Pinkstack/Superthoughts-lite-v1
4
  tags:
5
  - text-generation-inference
6
  - transformers
7
  - unsloth
8
  - llama
9
  - gguf
10
+ - cot
11
+ - superthoughts
12
+ - reasoning
13
+ - grpo
14
  license: apache-2.0
15
  language:
16
  - en
17
+ datasets:
18
+ - openai/gsm8k
19
+ - Pinkstack/intructions-sft-sharegpt
20
  ---
21
 
22
+ ![superthoughts lite](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/K5kYIHYj2aX2kB6MlcM9O.png)
23
+
24
+ # Information
25
+ Advanced, high-quality and lite reasoning for a tiny size that you can run on your phone.
26
+
27
+ Trained similarly to Deepseek R1, we used Smollm2 as a base model, then we've SFT fine tuned in on reasoning & modified the tokenizer slightly, after the SFT fine tuning we used GRPO to further amplify it's mathematics & problem solving abilities.
28
+
29
+ # Which quant is right for you?
30
+
31
+ ***F16***: Least hallucinations, high-quality reasoning yet heavy to run.
32
+ ***Q8_0***: Limited amount of hallucinations high-quality reasoning, recommended
33
+ ***Q6_k***: Hallucinates more, good reasoning but may fail at counting etc. only use if you cannot run Q8_0.
34
+ ***Q4_k_m***: Not recommended, Hallucinates, doesn't always think properly. easier to run though.
35
+
36
+ # Format
37
+ ```
38
+ <|im_start|>user
39
+ How many R's in strawberry<|im_end|>
40
+ <|im_start|>assistant
41
+ <think>
42
+ Alright, the user has asked how many R's in the word strawberry, that's easy! I just need to count each instance of the letter 'R' in the word 's-t-r-a-w-b-e-r-r-y' and then find out how many R's there are, lets count!
43
+ S - Not an R,
44
+ T - Not an R,
45
+ R - First instance of the letter R! (1),
46
+ A - Not an R,
47
+ W - Not an R,
48
+ B - Not an R,
49
+ E - Not an R,
50
+ R - Great! Second instance of the letter R. (2),
51
+ R - Third instance of the letter R. (3),
52
+ Y - Not an R.
53
+
54
+ So, i've counted all the letters correctly, meaning that I am sure that there are 3 R's in the word Strawberry. I should probably let the user know.
55
+ </think>
56
+ <output>3
57
+ </output><|im_end|>
58
+ ```
59
+ # system prompt
60
+ (important to ensure it would always think, output).
61
+ ```
62
+ respond in the following format:
63
+ <think>
64
+ ...
65
+ </think>
66
+ <output>
67
+ ...
68
+ </output>
69
+ ```
70
+ # Examples:
71
+ all responses below generated with our system prompt and a temperature of 0.7.
72
+ Generated inside the android application, ChatterUI via GGUF Q8, using the model's prompt format. and our
73
+ 1)
74
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/5veZJmkjuv_7W7pKhvsu0.png)
75
+ 2)
76
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/pAwPdVkEZ7rnFf-TZ5tMU.png)
77
+ 3)
78
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/6710ba6af1279fe0dfe33afe/FDaWAAqgv2kvoZvjl8gjl.png)
79
+
80
  # Uploaded model
81
 
82
  - **Developed by:** Pinkstack
83
  - **License:** apache-2.0
84
+ - **Finetuned from model :** HuggingFaceTB/SmolLM2-1.7B-Instruct