metadata

base_model:
  - Qwen/Qwen3-8B
  - Qwen/Qwen3-0.6B
  - Qwen/Qwen3-4B
  - Qwen/Qwen3-1.7B
library_name: transformers
license: apache-2.0
pipeline_tag: text-generation
tags:
  - Light weight
  - Agentic
  - Conversational

Qwen3 Quantized Models – Lexicons Edition

This repository provides quantized versions of the Qwen3 language models, optimized for efficient deployment on edge devices and low-resource environments. The following models have been added to our Lexicons Model Zoo:

Qwen_Qwen3-0.6B-Q4_K_M
Qwen_Qwen3-1.7B-Q4_K_M
Qwen_Qwen3-4B-Q4_K_M
Qwen3-8B-Q4_K_M

Model Overview

Qwen3 is the latest open-source LLM series developed by Alibaba Group. Released on April 28, 2025, the models were trained on 36 trillion tokens across 119 languages and dialects. Qwen3 models are instruction-tuned and support long context windows and multilingual capabilities. This model is described in An Empirical Study of Qwen3 Quantization.

The quantized versions provided here use 4-bit Q4_K_M precision ensuring high performance at a fraction of the memory and compute cost. These models are ideal for real-time inference, chatbots, and on-device applications.

Key Features

Efficient Quantization: 4-bit quantized models (Q4_K_M) for faster inference and lower memory usage.
Multilingual Mastery: Trained on a massive, diverse corpus covering 119+ languages.
Instruction-Tuned: Fine-tuned to follow user instructions effectively.
Scalable Sizes: Choose from 0.6B to 8B parameter models based on your use case.

Available Quantized Versions

Model Name	Parameters	Quantization	Context Length	Recommended Use
Qwen_Qwen3-0.6B-Q4_K_M	0.6B	Q4_K_M	4K tokens	Lightweight devices, microservices
Qwen_Qwen3-1.7B-Q4_K_M	1.7B	Q4_K_M	4K tokens	Fast inference, chatbots
Qwen_Qwen3-4B-Q4_K_M	4B	Q4_K_M	4K tokens	Balanced performance & efficiency
Qwen3-8B-Q4_K_M	8B	Q4_K_M	128K tokens	Complex reasoning, long documents

Performance Insights

Quantized Qwen3 models at Q4_K_M retain impressive reasoning and comprehension capabilities while cutting down the memory and compute needs. Based on the latest findings (arXiv:2505.02214), Qwen3 models are robust even under lower bit quantization when used appropriately.

Code

The project is released on Github and Hugging Face.