Proposal to revise multimodality statement

#12
by dkleine - opened

The current sentence in the model card

Gemma 3 models are multimodal, handling text and image input and generating text output

appears overly broad as not all Gemma 3 model sizes support image input (the smaller 270M and 1B variants are text-only).

Google org

Hi @dkleine Thank you for bringing this to our attention. We will forward this feedback to the relevant team to clarify the description.

Sign up or log in to comment