Spaces:

climb-mao
/

README

No application file

App Files Files Community

README / README.md

suchirsalhan's picture

Update README.md

4cddef6 verified about 1 month ago

|

history blame contribute delete

919 Bytes

	---
	title: README
	emoji: 📚
	colorFrom: pink
	colorTo: red
	sdk: streamlit
	pinned: false
	---

	# Less is More: Pre-Training Cross-Lingual Small-Scale Language Models with Cognitively-Plausible Curriculum Learning Strategies. Available from: https://arxiv.org/abs/2410.22886.

	Salhan et al (2024) creates age-ordered corpora of Child-Directed Speech for four typologically distant language families to implement SSLMs and acquisition-inspired curricula cross-lingually.

	The MAO-CHILDES dataset contains extract orthographic datasets for French, German, Japanese and Chinese and several other lower-resource languages. It is part of a wider effort for cognitively-inspired pretraining using resources from Language Acquistiion.

	You can also find pretrained BabyLMs for French, German, Japanese and Chinese with three different cognitively-inspired curriculum learning in the branches of each language-specific BabyLM.