dusty-nv's picture
Upload folder using huggingface_hub
d2f5142 verified

meta-llama

Model Configuration
Source Model meta-llama/Llama-3.3-70B-Instruct
Inference API MLC_LLM
Quantization q4f16_ft
Model Type llama
Vocab Size 128256
Context Window Size 131072
Prefill Chunk Size 8192
Temperature 0.6
Repetition Penalty 1.0
top_p 0.9
pad_token_id 0
bos_token_id 128000
eos_token_id [128001, 128008, 128009]

See jetson-ai-lab.com/models.html for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.