--- title: README emoji: 🏃 colorFrom: indigo colorTo: indigo sdk: static pinned: false ---
**English**🌎|[简体中文](https://github.com/opendatalab/opendatalab-datasets/blob/main/introduction%20CN.md)🀄 > [!NOTE] > 📚 In 2025, we have open-sourced a high-quality multilingual dataset, **WanJuan 3.0 (WanJuan Silu)** which comprises over 1.2TB of indigenous textual corpora from five countries. Each subset includes seven major categories and 34 subcategories, covering a wide range of local characteristics, such as history, politics, culture, real estate, shopping, weather, dining, encyclopedic knowledge, and professional expertise. Here are the download links for the five subsets, and we welcome everyone to download and use them. > > WanJuan3.0 [Korean](https://opendatalab.com/OpenDataLab/WanJuan-Korean) • [Arabic](https://opendatalab.com/OpenDataLab/WanJuan-Arabic) • [Vietnamese](https://opendatalab.com/OpenDataLab/WanJuan-Vietnamese)• [Russian](https://opendatalab.com/OpenDataLab/WanJuan-Russian)• [Thai](https://opendatalab.com/OpenDataLab/WanJuan-Thai) --- **🔥🔥🔥OpenDataLab Provide ecology for high-quality datasets for community.** It provides: # 🌟Extensive open data resources for AI Model ● High-speed and simple way to access open datasets ● 7700+ Large scale and high-quality open datasets for large model ● 1200+ Open datasets for Computer Vision