Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
magibu
's Collections
Pretrain Datasets
papers
Ekip karışık verileri
Fine-tuned LLMs
Turkish Language Healthcare Datasets
Pretrain Datasets
updated
Jan 3
Datasets we use for pretraining large language models
Upvote
-
omarkamali/wikipedia-monthly
Viewer
•
Updated
5 days ago
•
190M
•
2.71k
•
52
alibayram/hukuk_soru_cevap
Viewer
•
Updated
Nov 6, 2024
•
2.08k
•
23
•
14
umutertugrul/turkish-hospital-medical-articles
Viewer
•
Updated
Oct 2, 2025
•
24.6k
•
33
•
8
umutertugrul/turkish-medical-articles
Viewer
•
Updated
Oct 2, 2025
•
42.8k
•
12
•
3
alibayram/tr-books
Viewer
•
Updated
Dec 17, 2025
•
3.7k
•
7
selimfirat/bilkent-turkish-writings-dataset
Viewer
•
Updated
May 24, 2025
•
25.1k
•
37
•
8
umutertugrul/turkish-academic-theses-dataset
Viewer
•
Updated
Aug 18, 2025
•
649k
•
33
•
8
alibayram/onedio_haberler
Viewer
•
Updated
Jun 18, 2024
•
66.7k
•
7
•
5
habanoz/news-tr-1.8M
Viewer
•
Updated
Oct 6, 2024
•
1.85M
•
72
•
7
alibayram/hepsiburada_yorumlar
Viewer
•
Updated
Jun 18, 2024
•
2.66M
•
9
•
14
alibayram/kitapyurdu_yorumlar
Viewer
•
Updated
Jun 18, 2024
•
405k
•
14
alibayram/beyazperde_yorumlar
Viewer
•
Updated
Jun 18, 2024
•
192k
•
6
•
5
BILGEM-AI/BILGE-Synthetic-Stories
Viewer
•
Updated
Nov 20, 2025
•
2.87M
•
234
•
5
Upvote
-
Share collection
View history
Collection guide
Browse collections