Not enough parameters in 4b config.json

#14
by krammnic - opened

Hello there!

We are adding Gemma3 to the torchtune: https://github.com/pytorch/torchtune/pull/2485

Unfortunately, it seems to me that there are not enough parameters in config.json the 4b model.

For instance, let's compare 4B "text_config" and 12B "text_config"

4B:


"text_config": {

    "hidden_size": 2560,

    "intermediate_size": 10240,

    "model_type": "gemma3_text",

    "num_hidden_layers": 34,

    "rope_scaling": {

      "factor": 8.0,

      "rope_type": "linear"

    },

    "sliding_window": 1024

  },

12B


"text_config": {

    "hidden_size": 3840,

    "intermediate_size": 15360,

    "model_type": "gemma3_text",

    "num_attention_heads": 16,

    "num_hidden_layers": 48,

    "num_key_value_heads": 8,

    "rope_scaling": {

      "factor": 8.0,

      "rope_type": "linear"

    },

    "sliding_window": 1024

  },

If it is a desired config for some reason, please let me know. Unfortunately, it makes the integration less clean, as this information is required on the converting stage.

Thanks!

Sign up or log in to comment