Text Generation
GGUF
English
Chinese
MOE
Qwen 2.5 MOE
Mixture of Experts
Uncensored
2X7B
deepseek
reasoning
thinking
creative
128k context
general usage
problem solving
brainstorming
solve riddles
story generation
plot generation
storytelling
fiction story
story
writing
fiction
Qwen 2.5
mergekit
Inference Endpoints
conversational
Update README.md
Browse files
README.md
CHANGED
@@ -41,7 +41,9 @@ creating a 19B model with the "Abliterated" (Uncensored) version of Deepseek Qwe
|
|
41 |
|
42 |
The model is just over 19B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
|
43 |
|
44 |
-
|
|
|
|
|
45 |
|
46 |
Five example generations at the bottom of this page.
|
47 |
|
@@ -112,6 +114,17 @@ SOFTWARE patch (by me) for Silly Tavern (front end to connect to multiple AI app
|
|
112 |
|
113 |
---
|
114 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
115 |
<h2>Example Generation:</h2>
|
116 |
|
117 |
IQ4XS Quant, Temp 1.5, rep pen 1.06, topp: .95, minp: .05, topk: 40
|
|
|
41 |
|
42 |
The model is just over 19B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.
|
43 |
|
44 |
+
This "oddball" configuration yields interesting "thinking/reasoning" which is stronger than either 7B model on its own.
|
45 |
+
|
46 |
+
And you can use any temp settings you want (rather than a narrow range of .4 to .8), and the model will still "think/reason".
|
47 |
|
48 |
Five example generations at the bottom of this page.
|
49 |
|
|
|
114 |
|
115 |
---
|
116 |
|
117 |
+
Known Issues:
|
118 |
+
|
119 |
+
---
|
120 |
+
|
121 |
+
From time to time model will generate some Chinese symbols/characters, especially at higher temps. This is normal
|
122 |
+
for DeepSeek Distill models.
|
123 |
+
|
124 |
+
Reasoning/Thinking may be a little "odd" at temps 1.5+ ; you may need to regen to get a better response.
|
125 |
+
|
126 |
+
---
|
127 |
+
|
128 |
<h2>Example Generation:</h2>
|
129 |
|
130 |
IQ4XS Quant, Temp 1.5, rep pen 1.06, topp: .95, minp: .05, topk: 40
|