DavidAU's picture
Update README.md
53f37f1 verified
|
raw
history blame
5.21 kB
metadata
license: apache-2.0
language:
  - en
  - zh
tags:
  - MOE
  - Qwen 2.5 MOE
  - Mixture of Experts
  - Uncensored
  - 2X7B
  - deepseek
  - reasoning
  - thinking
  - creative
  - 128k context
  - general usage
  - problem solving
  - brainstorming
  - solve riddles
  - story generation
  - plot generation
  - storytelling
  - fiction story
  - story
  - writing
  - fiction
  - Qwen 2.5
  - mergekit
pipeline_tag: text-generation

(quants uploading, examples to be added)

Qwen2.5-MOE-2X7B-DeepSeek-Abliterated-Censored-19B-gguf

This is a Qwen2.5 MOE (Mixture of Experts) model comprised of TWO Qwen 2.5 Deepseek (Censored/Normal AND Uncensored) 7B models creating a 19B model with the "Abliterated" (Uncensored) version of Deepseek Qwen 2.5 7B "in charge" so to speak.

The model is just over 19B because of the unqiue "shared expert" (roughly 2.5 models here) used in Qwen MOEs.

The oddball configuration yields interesting "thinking/reasoning" which is stronger than either 7B model on its own.

Example generations at the bottom of this page.

This model can be used for all use cases, and is also (mostly) uncensored.

Context: 128k.

You need to use the "Jinja Template" encoded in the GGUF to use this model. You might be able to use Llama 3, and/or Chatml templates if your AI/LLM app can not access the "Jinja Template".

In Lmstudio the "Jinja Template" should load by default.

In other apps - use the Deepseek Tokenizer and/or "Jinja Template".

This model contains 2 times the power of DeepSeek Distill 7B reasoning/thinking models and shows exceptional performance.

Also, the DeepSeek Qwen 7B model is based on Qwen's 7B Math model so this model is slanted more towards math/logic problem solving and I would also say more "sciency" too.

This does not mean it will not work for your use case.

Also, because of how this model works (uncensored and censored in the same model) you may want to try 1-4 generations depending on your use case because even the "right" response will vary widely, and in many cases may be more "interesting".

Examples below so you have some idea what this model can do.

Keep in mind this model is two 7B parameters models working together, and will come close but may not have the power of a 14B or 32B reasoning/thinking model.

However, sometimes it will generate truly "out of the park" responses.

Temp of .4 to .8 is suggested (for best reasoning/thinking), however it will still operate at much higher temps like 1.8, 2.6 etc.

Depending on your prompt change temp SLOWLY: IE: .41,.42,.43 ... etc etc.

The model MAY function better if you breakdown the reasoning/thinking task(s) into smaller pieces :

"IE: Instead of asking for 6 plots FOR theme XYZ, ASK IT for ONE plot for theme XYZ at a time".

Also set context limit at 4k minimum, 8K+ suggested.

I also suggest quant of IQ4/Q4 or higher, as larger quants will reasoning/thinking and perform much better.

If you can run Q6/Q8, please use these one(s).

IQ4XS will give very different responses VS other quants.


Additional Support / Documents for this model to assist with generation / performance:

Document #1:

Details how to use reasoning/thinking models and get maximum performance from them, and includes links to all reasoning/thinking models - GGUF and source, as well as adapters to turn any "regular" model into a "reasoning/thinking" model.

[ https://huggingface.co/DavidAU/How-To-Use-Reasoning-Thinking-Models-and-Create-Them ]

Document #2:

Document detailing all parameters, settings, samplers and advanced samplers to use not only my models to their maximum potential - but all models (and quants) online (regardless of the repo) to their maximum potential. Included quick start and detailed notes, include AI / LLM apps and other critical information and references too. A must read if you are using any AI/LLM right now.

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

Software:

SOFTWARE patch (by me) for Silly Tavern (front end to connect to multiple AI apps / connect to AIs- like Koboldcpp, Lmstudio, Text Gen Web UI and other APIs) to control and improve output generation of ANY AI model. Also designed to control/wrangle some of my more "creative" models and make them perform perfectly with little to no parameter/samplers adjustments too.

[ https://huggingface.co/DavidAU/AI_Autocorrect__Auto-Creative-Enhancement__Auto-Low-Quant-Optimization__gguf-exl2-hqq-SOFTWARE ]


Example Generation:

IQ4XS Quant, Temp 1.5, rep pen 1.06, topp: .95, minp: .05, topk: 40

Below are the least creative outputs, prompt is in BOLD.

IMPORTANT:

Higher quants / imatrix quants will have much stronger generation - words, sentences, ideas, dialog and general quality.


EXAMPLE #1:


[[[Thinking Start]]]

[[[Thinking End]]]

OUTPUT:


EXAMPLE #2:


[[[Thinking Start]]]

[[[Thinking End]]]

OUTPUT:


EXAMPLE #3:


[[[Thinking Start]]]

[[[Thinking End]]]

OUTPUT:


EXAMPLE #4:


[[[Thinking Start]]]

[[[Thinking End]]]

OUTPUT: