NanoQuant Compressed Model
Model Description
This is a compressed version of tencent/Hunyuan-MT-7B created using NanoQuant, an advanced LLM compression toolkit.
Compression Details
- Compression Level: light
- Size Reduction: 65.0%
- Techniques Used:
- Quantization: 8bit
- Pruning: magnitude
- LoRA: {'r': 64, 'alpha': 32, 'dropout': 0.05}
Deployment Options
Option 1: Direct Usage with Transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")
tokenizer = AutoTokenizer.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")
Option 2: Ollama Deployment
This model is also available for Ollama:
ollama pull nanoquant-tencent-Hunyuan-MT-7B:light
Performance Characteristics
Due to the compression, this model:
- Requires significantly less storage space
- Has faster loading times
- Uses less memory during inference
- Maintains most of the original model's capabilities
Original Model
For information about the original model, please visit: https://huggingface.co/tencent/Hunyuan-MT-7B
License
This model is released under the Apache 2.0 license.
NanoQuant
NanoQuant is an advanced model compression system that achieves up to 99.95% size reduction while maintaining model performance. Learn more at NanoQuant Documentation.
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
๐
Ask for provider support