image_token_id mismatch causes "Image features and image tokens do not match" error in OSS-20B model

#1
by juno-kai - opened

Issue Description

When using the OpenGVLab/InternVL3_5-GPT-OSS-20B-A4B-Preview-HF model, inference fails with the following error:
-> ValueError: Image features and image tokens do not match: tokens: 0, features 3328

Root Cause

The issue stems from a token ID mismatch between the model configuration and the tokenizer:

  • The model's config.json specifies "image_token_id": 151671
  • However, the OSS-20B tokenizer actually maps <IMG_CONTEXT> to token ID 200021 (as seen in tokenizer_config.json)
  • The 14B model uses 151671 for <IMG_CONTEXT> (in its added_tokens.json), which appears to have been carried over to the OSS-20B config

Workaround

Users can fix this by manually updating the image_token_id after loading the model:

model = AutoModelForImageTextToText.from_pretrained(model_name, ...)
model.config.image_token_id = 200021  # Correct token ID for OSS-20B

Suggested Fix

Update the model's config.json to use the correct image_token_id: 200021 to match the tokenizer configuration.

Additional Note

The OSS-20B model is missing the added_tokens.json file that exists in the 14B model, though this doesn't appear to cause issues as the tokens are defined in tokenizer_config.json.

OpenGVLab org

🤗 Thank you for your interest and for pointing out the hidden bug as well as providing a detailed analysis. I have already updated the model’s config and verified that it now runs successfully.

Sign up or log in to comment