Did the base 1M start from qwen2.5-7b?

#4
by huu-ontocord - opened

Did you extend the qwen 2.5-7b to 1M and then did instruct? Is there any relation to qwen 2.5 7b

Specifically in the paper:
Qwen2.5-1M series are developed based on Qwen2.5 models (Yang et al., 2025) and support context
length up to 1M tokens.

And this:
The first two stages are similar to those of other Qwen2.5 models, where we directly use an intermediate
version from Qwen2.5 Base models for subsequent long-context training. Specifically, the model is initially
trained with a context length of 4096 tokens, and then the training is transferred to a context length of
32768 tokens. D

Qwen org

@huu-ontocord As described in the paper, we use an intermediate version of Qwen2.5 Base models. It reverts to a state several billion tokens before finishing the 32k training of Qwen2.5-7/14B, prior to the learning rate becoming too small.

That makes sense. Thank you very much for your answer!

huu-ontocord changed discussion status to closed

Sign up or log in to comment