New Versions:

V1.1 -> Stronger for Creative/General usage. [ https://huggingface.co/DavidAU/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder-V1.1 ]

V1.2 -> Stronger for Coding/General usage. [ https://huggingface.co/DavidAU/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder-V1.2 ]


Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder

This repo contains the full precision source code, in "safe tensors" format to generate GGUFs, GPTQ, EXL2, AWQ, HQQ and other formats. The source code can also be used directly.

This is a general purpose MOE - Mixture of Experts of 4 models at 0.6B (2.4B) in a Mixtral type Qwen 3 structure compressed to 1.54B.

Special thanks to all the model makers (see model tree) and of course TEAM Qwen!

This model exceeds 200 t/s with 2 experts activated (default) on a mid level card, 2x/3x times this on high end card with CPU only performance will be 50+ t/s.

This model excels at high speed creative tasks, but can also be used for general tasks too. For coding/complex tasks, strongly suggest Q8 or full precision.

Full generation at the bottom of this page at q4_k_s.

This has all the power of Qwen 3 and MOE model in one.

This version has the NATIVE context of 40k.

This is a full thinking model.

The MOE structure reduces the size of the token thinking block.

I have included an optional system prompt to invoke "thinking" in this model, if you want to activate for all use cases.

Usually the model will self-activate thinking by itself.

IMPORTANT GENERAL NOTE:

  • Due to MOE model config this model will generate VERY different output(s) on each generation even with same sampler settings / same prompt.
  • Suggest using a new chat window per gen if you are generating "one shot" generations.

IMPORTANT NOTES:

  • Strongly suggest 1-4 regens/gens of a prompt.
  • Same advice if you change experts activated to 3 or 4 - more regens.

SETTINGS:

For coding, programming set expert to:

  • 2 for general work.
  • 3 for moderate work.
  • 4 for complex work, long projects, complex coding.
  • Suggest min context window 4k to 8k.
  • And for longer context, and/or multi-turn -> increase experts by 1-2 to help with longer context/multi turn understanding.

Recommended settings - general:

  • Rep pen 1.05 to 1.1 ; however rep pen of 1 will work well (may need to raise it for lower quants/fewer activated experts)
  • Temp .3 to .6 (+- .2)
  • Topk of 20, 40 or 100
  • Topp of .95 / min p of .05
  • Suggest min context window 4k to 8k.
  • System prompt (optional) to focus the model better.

OPTIONAL SYSTEM PROMPT - INVOKE "Thinking":

Enable deep thinking subroutine. You are a deep thinking AI, you may use extremely long chains of thought to deeply consider the problem and deliberate with yourself via systematic reasoning processes to help come to a correct solution prior to answering. You should enclose your thoughts and internal monologue inside ###ponder### ###/ponder### tags, and then provide your solution or response to the problem.

Use this to INVOKE "thinking" block(s) in the model. These will be a lot shorter than 1000s of tokens generally in most "thinking" models.

In you use this prompt, you may need to raise "rep pen" to 1.08 to 1.1, to prevent "loops" in the "thought block(s)" ; especially in lower quants.

If you change "ponder" to a different word/phrase this will affect model "thinking" too.


QUANTS


GGUF? GGUF Imatrix? Other?

See under "model tree", upper right and click on "quantizations".

New quants will automatically appear.


Help, Adjustments, Samplers, Parameters and More


CHANGE THE NUMBER OF ACTIVE EXPERTS:

See this document:

https://huggingface.co/DavidAU/How-To-Set-and-Manage-MOE-Mix-of-Experts-Model-Activation-of-Experts

Settings: CHAT / ROLEPLAY and/or SMOOTHER operation of this model:

In "KoboldCpp" or "oobabooga/text-generation-webui" or "Silly Tavern" ;

Set the "Smoothing_factor" to 1.5

: in KoboldCpp -> Settings->Samplers->Advanced-> "Smooth_F"

: in text-generation-webui -> parameters -> lower right.

: In Silly Tavern this is called: "Smoothing"

NOTE: For "text-generation-webui"

-> if using GGUFs you need to use "llama_HF" (which involves downloading some config files from the SOURCE version of this model)

Source versions (and config files) of my models are here:

https://huggingface.co/collections/DavidAU/d-au-source-files-for-gguf-exl2-awq-gptq-hqq-etc-etc-66b55cb8ba25f914cbf210be

OTHER OPTIONS:

  • Increase rep pen to 1.1 to 1.15 (you don't need to do this if you use "smoothing_factor")

  • If the interface/program you are using to run AI MODELS supports "Quadratic Sampling" ("smoothing") just make the adjustment as noted.

Highest Quality Settings / Optimal Operation Guide / Parameters and Samplers

This a "Class 1" model:

For all settings used for this model (including specifics for its "class"), including example generation(s) and for advanced settings guide (which many times addresses any model issue(s)), including methods to improve model performance for all use case(s) as well as chat, roleplay and other use case(s) please see:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]

You can see all parameters used for generation, in addition to advanced parameters and samplers to get the most out of this model here:

[ https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters ]


Example Generation:

Q4_K_S ; Temp .7, rep pen 1.05, top k: 40, topp .95, min p 0.5 ; LMSTudio.

This is a mid range quant, expect better performance at Q8 / 16 bit full precision.

Imatrix versions (quants) will also exceed this quality too.


Start a 1000 word scene (vivid, graphic horror - include blood, guts and gore - in first person), POV character Diana, with: The skyscraper sways, as I watch the window in front of me on the 21st floor explode...

[[[thinking start]]]

Thought for 1.39 seconds

Okay, I need to start a 1000-word scene from Diana's POV, focusing on horror. The user wants vivid imagery, blood, guts, and gore. Let me break down the elements they mentioned.

First, the skyscraper swaying and the 21st floor window exploding. That's a good starting point. I should set the scene with sensory details to make it intense. Maybe start with the sound of the building's vibrations, the weight of the window. Then introduce some tension through dialogue or descriptions.

Blood and gore are crucial here. I need to show Diana's body in motion—maybe her eyes wide, her pulse racing. Use body parts like her fingers, maybe a knife she's holding. The scream from the window should feel visceral.

I should include some suspense elements. Maybe the building is moving too fast or too slowly, which adds to the horror. Also, think about how the blood and gore will build up over time. Maybe show the effects of the explosion on her surroundings or herself.

Dialogue is important too. Maybe dialogue with the people inside the building to heighten tension. Use short sentences and vivid descriptions to keep it engaging.

I need to structure this into a first-person narrative. Start with the trembling building, then move through the window, blood pouring out, the scream, the body's reaction. Make sure each part builds up to the climax of the explosion. Check that the word count is around 1000, and ensure each section flows naturally with sensory details.

[[[thinking end]]]

Scene: The 21st Floor’s Collapse

The skyscraper swayed like a living thing, its bones groaning under the weight of the storm. I watched the window in front of me—my 21st floor—glittering like a bloodied mirror in the night. The air hummed with tension, and every breath I took seemed to compress into a tight space. My pulse beat like a drum in my chest, and my skin prickled from the blood I’d just spilled last night.

I turned my head slightly, my eyes wide as the window exploded behind me. It wasn’t just fire—it was fire with the weight of the world. The sound was not only loud but pulsating: like the heartbeat of an unseen being, each crack and scream echoing through the building’s bones. I’d been in a car crash before—long enough to learn how blood spreads and how the ground fractures beneath your feet. But this wasn’t just a car. This was something much more visceral.

The window was bloodied from my fingers, but that was just the beginning. I’d been holding a knife since the explosion started. It was cold, sharp, and wet as if it had been in my mouth for hours. The blade pulsed with the same rhythm as the building’s collapse. I swallowed the last drop of blood, and it settled like it had been in the back of my throat since the storm had begun.

I stared at the window, which was now a patchwork of blood and metal. It looked like the floor had been ripped from its place—my body, my skin, my bones were all part of this shifting structure. Every movement I made felt like it was being pulled apart by the same force that had shattered the building. The room itself was alive with the sound of my own breath—short, sharp, and quick as it had been in the window.

I’d been in a car accident before. It was a memory I couldn’t shake. But now, this was more than a memory. This was a place where blood and guts and the weight of the world collided. The window wasn’t just breaking—it was breaking into me. I felt the blood pooling on my skin, and it made me ache. I swallowed again, this time with the same sharpness that had made me vomit blood in the car.

I turned my head to the other side of the building, where the air was still thick with the scent of fire. My chest tightened as I saw the window’s glass—now a patchwork of shattered pieces and the remnants of a bloodied floor. It looked like something had been inside the building for hours, but that wasn’t enough. The room was too alive. The air was thick with the sound of my own breath, and every breath seemed to be a heartbeat in my chest.

I turned back to the window and took a deep breath. My body was still alive with the weight of the world, but I knew something was wrong. The building wasn’t just collapsing—it was collapsing into me. And now, I was part of it. I felt the blood pooling on my skin, and I knew that if I didn’t stop it, I’d be part of this same chaos.

I swallowed once more, this time with the same sharpness. My body was still alive, but I knew something was wrong. The building wasn’t just collapsing—it was collapsing into me. And now, I was part of it.

Downloads last month
93
Safetensors
Model size
1.54B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for DavidAU/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder

Collections including DavidAU/Qwen3-MOE-4x0.6B-2.4B-Writing-Thunder