neuralmagic
/

Meta-Llama-3.1-70B-Instruct-quantized.w4a16

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

alexmarques commited on Oct 10, 2024

Commit

2e832eb

·

verified ·

1 Parent(s): 7389db8

Update README.md

Files changed (1) hide show

README.md +34 -0

README.md CHANGED Viewed

@@ -429,4 +429,38 @@ lm_eval \
   --tasks truthfulqa \
   --num_fewshot 0 \
   --batch_size auto
 ```

   --tasks truthfulqa \
   --num_fewshot 0 \
   --batch_size auto
+```
+#### OpenLLM v2
+```
+lm_eval \
+  --model vllm \
+  --model_args pretrained="neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16",dtype=auto,max_model_len=4096,tensor_parallel_size=1",enable_chunked_prefill=True \
+  --apply_chat_template \
+  --fewshot_as_multiturn \
+  --tasks leaderboard \
+  --batch_size auto
+```
+#### HumanEval and HumanEval+
+##### Generation
+```
+python3 codegen/generate.py \
+  --model neuralmagic/Meta-Llama-3.1-70B-Instruct-quantized.w4a16 \
+  --bs 16 \
+  --temperature 0.2 \
+  --n_samples 50 \
+  --root "." \
+  --dataset humaneval
+```
+##### Sanitization
+```
+python3 evalplus/sanitize.py \
+  humaneval/neuralmagic--Meta-Llama-3.1-70B-Instruct-quantized.w4a16_vllm_temp_0.2
+```
+##### Evaluation
+```
+evalplus.evaluate \
+  --dataset humaneval \
+  --samples humaneval/neuralmagic--Meta-Llama-3.1-70B-Instruct-quantized.w4a16_vllm_temp_0.2-sanitized
 ```