Query about `model_max_length` configuration

#4
by vm7608 - opened

Hello, and thank you for this fantastic model.

I have a quick question about the configuration. I noticed that config.json sets max_position_embeddings to 131072, but the tokenizer_config.json has model_max_length set to 16384.

This causes a confusing situation when serving the model with vLLM. The startup log correctly shows that the engine is using the full context:
INFO ... Using max model len 131072

However, the API server still throws a validation warning based on the tokenizer's setting for any input over 16k tokens:
Token indices sequence length is longer than the specified maximum sequence length for this model (38682 > 16384).

Is this discrepancy intentional?

Anw, thanks again for all your hard work on this releases!

Sign up or log in to comment