aldigobbler
/

smollmv2-360Mx4E-MoE-v0.1

Model card Files Files and versions

smollmv2-360Mx4E-MoE-v0.1 / README.md

aldigobbler's picture

Update README.md

f5628b3 verified 3 months ago

|

history blame contribute delete

409 Bytes

	# Highly experimental proper MoE
	Based off of smollmv2. (Llama)
	MoE-ified then further trained on a general dataset.

	### info:
	```
	MoE layers: [8, 12, 16, 20, 24, 28]
	Top-k: 2 (activates 50.0% of experts per token)
	Hidden size: 960
	Total parameters: 494,554,560
	Trainable parameters: 494,554,560
	Auxiliary loss weight: 0.01
	```

	code @ https://gist.github.com/cappuch/6a454ec8d2d349a27f9fd84f6ac90554