Qwen3-GGUF / README.md
SandLogicTechnologies's picture
Add link to paper (#2)
0ff8055 verified
metadata
base_model:
  - Qwen/Qwen3-8B
  - Qwen/Qwen3-0.6B
  - Qwen/Qwen3-4B
  - Qwen/Qwen3-1.7B
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
  - Light weight
  - Agentic
  - Conversational

Qwen3 Quantized Models – Lexicons Edition

This repository provides quantized versions of the Qwen3 language models, optimized for efficient deployment on edge devices and low-resource environments. The following models have been added to our Lexicons Model Zoo:

  • Qwen_Qwen3-0.6B-Q4_K_M
  • Qwen_Qwen3-1.7B-Q4_K_M
  • Qwen_Qwen3-4B-Q4_K_M
  • Qwen3-8B-Q4_K_M

Model Overview

Qwen3 is the latest open-source LLM series developed by Alibaba Group. Released on April 28, 2025, the models were trained on 36 trillion tokens across 119 languages and dialects. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in An Empirical Study of Qwen3 Quantization.

The quantized versions provided here use 4-bit Q4_K_M precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.


Key Features

  • Efficient Quantization: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
  • Multilingual Mastery: Trained on a massive, diverse corpus covering 119+ languages.
  • Instruction-Tuned: Fine-tuned to follow user instructions effectively.
  • Scalable Sizes: Choose from 0.6B to 8B parameter models based on your use case.

Available Quantized Versions

Model Name Parameters Quantization Context Length Recommended Use
Qwen_Qwen3-0.6B-Q4_K_M 0.6B Q4_K_M 4K tokens Lightweight devices, microservices
Qwen_Qwen3-1.7B-Q4_K_M 1.7B Q4_K_M 4K tokens Fast inference, chatbots
Qwen_Qwen3-4B-Q4_K_M 4B Q4_K_M 4K tokens Balanced performance & efficiency
Qwen3-8B-Q4_K_M 8B Q4_K_M 128K tokens Complex reasoning, long documents

Performance Insights

Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings (arXiv:2505.02214), Qwen3 models are robust even under lower bit quantization when used appropriately.

Code

The project is released on Github and Hugging Face.