--- datasets: - HuggingFaceH4/ultrachat_200k base_model: - meta-llama/Meta-Llama-3-70B-Instruct library_name: transformers --- ## meta-llama/Meta-Llama-3-70B-Instruct - W8A8_FP8 Compression This is a compressed model using [llmcompressor](https://github.com/vllm-project/llm-compressor). ## Compression Configuration - Base Model: meta-llama/Meta-Llama-3-70B-Instruct - Compression Scheme: W8A8_FP8 - Dataset: HuggingFaceH4/ultrachat_200k - Dataset Split: train_sft - Number of Samples: 512 - Preprocessor: chat - Maximum Sequence Length: 8192 ## Sample Output #### Prompt: ``` Could not generate output ``` #### Output: ``` CUDA error: unknown error CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1 Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions. ``` ## Evaluation