brandonbeiler
/

InternVL3_5-38B-FP8-Dynamic

Image-Text-to-Text

compressed-tensors

Model card Files Files and versions

brandonbeiler commited on Aug 27

Commit

9328e13

·

verified ·

1 Parent(s): e6e579b

Update README.md

Files changed (1) hide show

README.md +27 -9

README.md CHANGED Viewed

@@ -1,20 +1,21 @@
 ---
-language:
-- en
-- zh
 tags:
 - fp8
-- quantization
-- dynamic
-- vision-language
-- multimodal
 - vllm
 - llm-compressor
 - internvl3.5
 pipeline_tag: image-text-to-text
 inference: false
 license: mit
 ---
 # InternVL3.5 38B FP8
@@ -41,7 +42,22 @@ The quantization process uses a specialized recipe that preserves the model's co
 | **Quantization Library** | [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.7.1 |
 | **Quantized By** | [brandonbeiler](https://huggingface.co/brandonbeiler) |
-## Usage with vLLM
 The following snippet demonstrates inference using the vLLM library.
@@ -69,6 +85,8 @@ response = model.generate(prompt, sampling_params)
 print(response[0].outputs[0].text)
 ```
 ## Technical Specifications
 ### Hardware Requirements

 ---
 tags:
 - fp8
+- fp8-dynamic
 - vllm
 - llm-compressor
 - internvl3.5
+- internvl
+language:
+- multilingual
 pipeline_tag: image-text-to-text
 inference: false
 license: mit
+base_model:
+- OpenGVLab/InternVL3_5-38B
+datasets:
+- OpenGVLab/MMPR-v1.2
+library_name: vllm
 ---
 # InternVL3.5 38B FP8
 | **Quantization Library** | [LLM Compressor](https://github.com/vllm-project/llm-compressor) v0.7.1 |
 | **Quantized By** | [brandonbeiler](https://huggingface.co/brandonbeiler) |
+## With vLLM OpenAI-Compatible Server
+You can serve the model using vLLM's OpenAI-compatible API server.
+```bash
+python -m vllm.entrypoints.openai.api_server \
+    --model brandonbeiler/InternVL3_5-38B-FP8-Dynamic \
+    --quantization compressed-tensors \
+    --served-model-name internvl3_5-38b \
+    --reasoning-parser: qwen3 \
+    --trust-remote-code \
+    --max-model-len 32768 \
+    --tensor-parallel-size 1 # Adjust based on your GPU setup
+```
+## Usage with vLLM in Python
 The following snippet demonstrates inference using the vLLM library.
 print(response[0].outputs[0].text)
 ```
 ## Technical Specifications
 ### Hardware Requirements