Serve with vLLM

by 4o4notf0und - opened 5 days ago

5 days ago

Has anyone been able to serve the model with vLLM?

In my practice, for every request, vLLM is refreshing the processor, which is supposed to be cached. The service will then be down due to HTTP Error 429 caused by repetitive HEAD operations. No idea what breaks the LRU cache.

4o4notf0und

4 days ago

After debug, it seems that get_processor always raises AttributeError('Qwen2TokenizerFast has no attribute start_image_token')

Weiyun1025

OpenGVLab org 3 days ago

Thank you for your interest in our work.
vLLM does support the GitHub-format InternVL. However, the error you encountered seems to come from the preprocessor assuming the model is in HuggingFace format. I suggest trying a lower version of vLLM (e.g., 0.8.5.post1 for Qwen3 or 0.10.1 GPT-OSS), or using our HF-format checkpoint. If the issue persists, we recommend deploying with LMDeploy.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment