InternVL3.5 FP8
Collection
OpenGVLabs InternVL3.5 models quantized to FP8
β’
8 items
β’
Updated
This is a fp8 dynamic (w8a8) version of OpenGVLab/InternVL3_5-14B, optimized for high-performance inference with vLLM. The model utilizes fp8 dynamic (w8a8) for optimal performance and deployment.
You can serve the model using vLLM's OpenAI-compatible API server.
vllm serve brandonbeiler/InternVL3_5-14B-FP8-Dynamic \
--quantization compressed-tensors \
--served-model-name internvl3_5-14b \
--reasoning-parser qwen3 \
--trust-remote-code \
--max-model-len 32768 \
--tensor-parallel-size 1 # Adjust based on your GPU setup
Notes
This model was created using:
llmcompressor==0.7.1
compressed-tensors==0.10.2
transformers==4.55.0
torch==2.7.1
vllm==0.10.1.1
Quantized with β€οΈ using LLM Compressor for the open-source community
Base model
OpenGVLab/InternVL3_5-14B-Pretrained