Shouldn't InternViT-300M-448px-V2_5 be the same as the vision model of InternVL2.5?
#2
by
niktheod
- opened
In the paper of InternVL2.5 the following is mentioned:
"""
In this report, we further refined the InternViT-300M by incrementally pre-training the previous weights on a more diverse data mixture using the NTP loss, leading to the enhanced InternViT-300M-448px-V2.5.
"""
Doesn't this mean that InternViT-300M-448px-V2_5 should be the same as taking the vision encoder of InternVL2.5? I check all the vision encoders of InternVL2.5-1/2/4/8B and non shares the same parameters as InternViT-300M-448px-V2_5. Could you please clarify what are the differences?