Base models trained on 1T high-quality tokens, demonstrating strong competitiveness among existing SOTA small models (<2B).
ParScale
community
AI & ML interests
None defined yet.
Instruct models from the ParScale-1.8B base models, trained on SmolTalk-1M to enable conversational capabilities.
-
ParScale/ParScale-1.8B-P8-Inst
Text Generation ⢠2B ⢠Updated ⢠63 ⢠2 -
ParScale/ParScale-1.8B-P4-Inst
Text Generation ⢠2B ⢠Updated ⢠16 ⢠1 -
ParScale/ParScale-1.8B-P2-Inst
Text Generation ⢠2B ⢠Updated ⢠6 -
ParScale/ParScale-1.8B-P1-Inst
Text Generation ⢠2B ⢠Updated ⢠75 ⢠1
Continual pre-training Qwen-2.5-3B model.
Base models trained on 1T high-quality tokens, demonstrating strong competitiveness among existing SOTA small models (<2B).
Instruct models from the ParScale-1.8B base models, trained on SmolTalk-1M to enable conversational capabilities.
-
ParScale/ParScale-1.8B-P8-Inst
Text Generation ⢠2B ⢠Updated ⢠63 ⢠2 -
ParScale/ParScale-1.8B-P4-Inst
Text Generation ⢠2B ⢠Updated ⢠16 ⢠1 -
ParScale/ParScale-1.8B-P2-Inst
Text Generation ⢠2B ⢠Updated ⢠6 -
ParScale/ParScale-1.8B-P1-Inst
Text Generation ⢠2B ⢠Updated ⢠75 ⢠1
Checkpoints for PEFT Qwen-2.5; backbone weight is frozen.
Continual pre-training Qwen-2.5-3B model.