some high-quality Chinese corpus you can find
Leon
Leon-Leee
AI & ML interests
LLMs, code generation, chatbot, workflows
Recent Activity
liked
a dataset
about 5 hours ago
nvidia/Nemotron-CC-Math-v1
commented on
a paper
18 days ago
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning
liked
a model
21 days ago
openai/gpt-oss-20b
Organizations
Useful Pretrain-Datasets
pretrain-datasets with (maybe) good quality
High Quality Instruct Collections
awesome-zh-corpus
some high-quality Chinese corpus you can find
GPT-4 generated datasets
Collection of some GPT-4 generated datasets. It may be useful for those looking for the best-quality datasets to train competitive LLMs.
Useful Pretrain-Datasets
pretrain-datasets with (maybe) good quality
Code, Math, and Reasoning related Instruct datasets
as this title
High Quality Instruct Collections
Code Benchmarks