Not enough parameters in 4b config.json
#14
by
krammnic
- opened
Hello there!
We are adding Gemma3 to the torchtune: https://github.com/pytorch/torchtune/pull/2485
Unfortunately, it seems to me that there are not enough parameters in config.json the 4b model.
For instance, let's compare 4B "text_config" and 12B "text_config"
4B:
"text_config": {
"hidden_size": 2560,
"intermediate_size": 10240,
"model_type": "gemma3_text",
"num_hidden_layers": 34,
"rope_scaling": {
"factor": 8.0,
"rope_type": "linear"
},
"sliding_window": 1024
},
12B
"text_config": {
"hidden_size": 3840,
"intermediate_size": 15360,
"model_type": "gemma3_text",
"num_attention_heads": 16,
"num_hidden_layers": 48,
"num_key_value_heads": 8,
"rope_scaling": {
"factor": 8.0,
"rope_type": "linear"
},
"sliding_window": 1024
},
If it is a desired config for some reason, please let me know. Unfortunately, it makes the integration less clean, as this information is required on the converting stage.
Thanks!