Need to set scale to 0.25 in `config.json`?
I notice the max_position_embeddings
in config.json
has already been set to 8192, but there are no scale
variable in config.json
, thus it will be default set to 1 as per the modelling_llama.py
in this repo. Do we need to add a scale
variable in the config.json
and set it to 0.25 for the model to work in 8K context?
Yeah I defaulted the max_position_embeddings to 8192 then the code that's run by trust_remote_code=True will then set scale
to 4 automatically. If you edited max positions to 4K, it would use 2 instead.
So no you don't need to set a scale param, just edit max_position_embeddings according to what you want, and the customised Llama modelling code will figure it out. Just make sure to set trust_remote_code=True
Sorry I should have mentioned that in the fp16 READMEs - it is described in my GPTQ readmes but I didn't put it in the fp16s.
No worries, thanks for the clarifications! I just saw the code implementation where the scale is calculated from the max_position_embeddings.