swayamsingal's picture
Add model card with deployment instructions
0ea2d8b verified
metadata
language: en
tags:
  - llm
  - compression
  - nanoquant
  - quantization
  - pruning
license: apache-2.0
datasets: []
model-index: []

NanoQuant Compressed Model

Model Description

This is a compressed version of tencent/Hunyuan-MT-7B created using NanoQuant, an advanced LLM compression toolkit.

Compression Details

  • Compression Level: light
  • Size Reduction: 65.0%
  • Techniques Used:
    • Quantization: 8bit
    • Pruning: magnitude
    • LoRA: {'r': 64, 'alpha': 32, 'dropout': 0.05}

Deployment Options

Option 1: Direct Usage with Transformers

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")
tokenizer = AutoTokenizer.from_pretrained("tencent_Hunyuan-MT-7B_nanoquant_light")

Option 2: Ollama Deployment

This model is also available for Ollama:

ollama pull nanoquant-tencent-Hunyuan-MT-7B:light

Performance Characteristics

Due to the compression, this model:

  • Requires significantly less storage space
  • Has faster loading times
  • Uses less memory during inference
  • Maintains most of the original model's capabilities

Original Model

For information about the original model, please visit: https://huggingface.co/tencent/Hunyuan-MT-7B

License

This model is released under the Apache 2.0 license.

NanoQuant

NanoQuant is an advanced model compression system that achieves up to 99.95% size reduction while maintaining model performance. Learn more at NanoQuant Documentation.