NanoQuant Compressed Model

Model Description

This is a compressed version of tencent/Hunyuan-MT-7B created using NanoQuant, an advanced LLM compression toolkit.

Compression Details

Compression Level: light
Size Reduction: 65.0%
Techniques Used:
- Quantization: 8bit
- Pruning: magnitude
- LoRA: {'r': 64, 'alpha': 32, 'dropout': 0.05}

Deployment Options

Option 1: Direct Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")
tokenizer = AutoTokenizer.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")

Option 2: Ollama Deployment

This model is also available for Ollama:

ollama pull nanoquant-tencent-Hunyuan-MT-7B:light

Performance Characteristics

Due to the compression, this model:

Requires significantly less storage space
Has faster loading times
Uses less memory during inference
Maintains most of the original model's capabilities

Original Model

For information about the original model, please visit: https://huggingface.co/tencent/Hunyuan-MT-7B

License

This model is released under the Apache 2.0 license.

NanoQuant

NanoQuant is an advanced model compression system that achieves up to 99.95% size reduction while maintaining model performance. Learn more at NanoQuant Documentation.

Downloads last month: -

Safetensors

Model size

8B params

Tensor type

F16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support