image_token_id mismatch causes "Image features and image tokens do not match" error in OSS-20B model
Issue Description
When using the OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF
model, inference fails with the following error:
-> ValueError: Image features and image tokens do not match: tokens: 0, features 3328
Root Cause
The issue stems from a token ID mismatch between the model configuration and the tokenizer:
- The model's
config.json
specifies"image_token_id": 151671
- However, the OSS-20B tokenizer actually maps
<IMG_CONTEXT>
to token ID200021
(as seen intokenizer_config.json
) - The 14B model uses
151671
for<IMG_CONTEXT>
(in itsadded_tokens.json
), which appears to have been carried over to the OSS-20B config
Workaround
Users can fix this by manually updating the image_token_id after loading the model:
model = AutoModelForImageTextToText.from_pretrained(model_name, ...)
model.config.image_token_id = 200021 # Correct token ID for OSS-20B
Suggested Fix
Update the model's config.json to use the correct image_token_id: 200021 to match the tokenizer configuration.
Additional Note
The OSS-20B model is missing the added_tokens.json file that exists in the 14B model, though this doesn't appear to cause issues as the tokens are defined in tokenizer_config.json.
🤗 Thank you for your interest and for pointing out the hidden bug as well as providing a detailed analysis. I have already updated the model’s config and verified that it now runs successfully.