meta-llama
Model Configuration | |
---|---|
Source Model | meta-llama/Llama-3.3-70B-Instruct |
Inference API | MLC_LLM |
Quantization | q4f16_ft |
Model Type | llama |
Vocab Size | 128256 |
Context Window Size | 131072 |
Prefill Chunk Size | 8192 |
Temperature | 0.6 |
Repetition Penalty | 1.0 |
top_p | 0.9 |
pad_token_id | 0 |
bos_token_id | 128000 |
eos_token_id | [128001, 128008, 128009] |
See jetson-ai-lab.com/models.html
for benchmarks, examples, and containers to deploy local serving and inference for these quantized models.