NanoQuant Compressed Model

Model Description

This is a compressed version of tencent/Hunyuan-MT-7B created using NanoQuant, an advanced LLM compression toolkit.

Compression Details

  • Compression Level: light
  • Size Reduction: 65.0%
  • Techniques Used:
    • Quantization: 8bit
    • Pruning: magnitude
    • LoRA: {'r': 64, 'alpha': 32, 'dropout': 0.05}

Deployment Options

Option 1: Direct Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")
tokenizer = AutoTokenizer.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")

Option 2: Ollama Deployment

This model is also available for Ollama:

ollama pull nanoquant-tencent-Hunyuan-MT-7B:light

Performance Characteristics

Due to the compression, this model:

  • Requires significantly less storage space
  • Has faster loading times
  • Uses less memory during inference
  • Maintains most of the original model's capabilities

Original Model

For information about the original model, please visit: https://huggingface.co/tencent/Hunyuan-MT-7B

License

This model is released under the Apache 2.0 license.

NanoQuant

NanoQuant is an advanced model compression system that achieves up to 99.95% size reduction while maintaining model performance. Learn more at NanoQuant Documentation.

Downloads last month
-
Safetensors
Model size
8B params
Tensor type
F16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support