File size: 5,007 Bytes
871b089
 
 
 
 
 
 
 
 
 
853cbfb
871b089
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
---
license: apache-2.0
language:
- en
- zh
tags:
- MOE
- Qwen 2.5 MOE
- Mixture of Experts
- Uncensored
- 2X7B
- deepseek
- reasoning
- thinking
- creative
- 128k context
- general usage
- problem solving
- brainstorming
- solve riddles
- story generation
- plot generation
- storytelling
- fiction story
- story
- writing
- fiction
- Qwen 2.5
- mergekit
pipeline_tag: text-generation
---

(quants uploading, examples to be added)

<H2>Qwen2.5-MOE-2X7B-DeepSeek-Abliterated-Censored-15B-gguf</H2>

<img src="qwen-tiny.jpg" style="float:right; width:300px; height:300px; padding:5px;">

This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 7B models 
creating a 15B model with the "Abliterated" (Uncensored) version of Deepseek Qwen 2.5 7B "in charge" so to speak.

The model is just over 15B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.

The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 1.5B model on its own.

Example generations at the bottom of this page.

This model can be used for all use cases, and is also (mostly) uncensored.

Context: 128k.

You need to use the "Jinja Template" encoded in the GGUF to use this model. You might be able to use Llama 3, and/or Chatml templates
if your AI/LLM app can not access the "Jinja Template".

In Lmstudio the "Jinja Template" should load by default.

In other apps - use the Deepseek Tokenizer and/or "Jinja Template".

This model contains 2 times the power of DeepSeek Distill reasoning/thinking and shows exceptional performance.

Also, the DeepSeek Qwen 7B model is based on Qwen's 7B Math model so this model is slanted more towards math/logic problem solving
and I would also say more "sciency" too.

This does not mean it will not work for your use case.

Also, because of how this model works (uncensored and censored in the same model) you may want to try 1-4 generations depending
on your use case because even the "right" response will vary widely, and in many cases be more "interesting".

Examples below so you have some idea what this model can do.

Keep in mind this model is two 7B parameters models working together, and will come close but may not have the power of a 14B or 32B reasoning/thinking model.

However, sometimes it will generate truly "out of the park" responses.

Temp of .4 to .8 is suggested (for best reasoning/thinking), however it will still operate at much higher temps like 1.8, 2.6 etc.

Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.

The model MAY function better if you breakdown the reasoning/thinking task(s) into smaller pieces :

"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".
 
Also set context limit at 4k minimum, 8K+ suggested.

I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.

If you can run Q6/Q8, please use these one(s).

IQ4XS will give very different responses VS other quants.

---

<B> Additional Support / Documents for this model to assist with generation / performance: </b>

Document #1:

Details how to use reasoning/thinking models and get maximum performance from them, and includes links to all reasoning/thinking models - GGUF and source, as well as adapters to turn any "regular" model into a "reasoning/thinking" model.

[ https://huggingface.co/DavidAU/How-To-Use-Reasoning-Thinking-Models-and-Create-Them ]

Document #2:

Document detailing all parameters, settings, samplers and advanced samplers to use not only my models to their maximum potential - but all models (and quants) online (regardless of the repo) to their maximum potential. Included quick start and detailed notes, include AI / LLM apps and other critical information and references too. A must read if you are using any AI/LLM right now.

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Software:

SOFTWARE patch (by me) for Silly Tavern (front end to connect to multiple AI apps / connect to AIs- like Koboldcpp, Lmstudio, Text Gen Web UI and other APIs) to control and improve output generation of ANY AI model. Also designed to control/wrangle some of my more "creative" models and make them perform perfectly with little to no parameter/samplers adjustments too.

[ https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE ]

---

<h2>Example Generation:</h2>

IQ4XS Quant, Temp 1.5, rep pen 1.06, topp: .95, minp: .05, topk: 40

---

EXAMPLE #1:

---

<B>
  
</B>

[[[Thinking Start]]]


[[[Thinking End]]]

OUTPUT:

---

EXAMPLE #2:

---

<B>
  
</B>

[[[Thinking Start]]]


[[[Thinking End]]]

OUTPUT:

---

EXAMPLE #3:

---

<B>
  
</B>

[[[Thinking Start]]]


[[[Thinking End]]]

OUTPUT:

---

EXAMPLE #4:

---

<B>
  
</B>

[[[Thinking Start]]]


[[[Thinking End]]]

OUTPUT: