AllSparkv2: A Language-centric Progressive Omni-modal Learning Framework

Run Shao, and Haifeng Li. School of Geosciences and Info-physics, Central South University

Introduction

AllSparkv2 is a progressive multimodal learning framework that decouples cross-modal general knowledge from modality-specific knowledge at both the architecture and training strategy levels. Inspired by Piaget's Theory of Cognitive Development, AllSparkv2 introduces the Modal Mixture of Experts (M-MoE) architecture, where dedicated experts handle different modalities to decouple the parameter space, and new modality experts inherit cross-modal general knowledge by initializing from existing ones. In training, a hierarchical modality learning strategy is implemented, starting with vision as the initial modality, followed by point clouds as the successive modality. AllSparkv2 undergoes full-parameter training on vision for powerful cross-modal general knowledge, while only modality-specific experts are trained for point clouds, preserving existing knowledge. Experimental results demonstrate that AllSparkv2 can progressively integrate new modalities while preventing catastrophic forgetting and enhancing cross-modal performance.

Note

We provide this model in four different sizes: 0.5B, 1B, 3B, and 7B. You can find them at the following links:

If you're using AllSparkv2 in your research or applications, please cite using this BibTeX:


License

This repository is under BSD 3-Clause License.

Downloads last month
6
Safetensors
Model size
5.16B params
Tensor type
BF16
·
Inference Providers NEW
This model is not currently available via any of the supported Inference Providers.
The model cannot be deployed to the HF Inference API: The model has no library tag.

Model tree for ShaoRun/AllSparkv2-1.5B-V-P

Base model

Qwen/Qwen2.5-1.5B
Finetuned
(399)
this model