Spaces:
Running
Running
title: README | |
emoji: π | |
colorFrom: indigo | |
colorTo: purple | |
sdk: static | |
pinned: true | |
short_description: Explore Common Crawl's metadata and experimental datasets | |
# Common Crawl | |
Welcome to the Common Crawl Foundation's Hugging Face page! | |
We aim to provide metadata and experimental versions of our latest data products here. | |
### Useful Links | |
- [Common Crawl's official website](https://commoncrawl.org/) | |
- [Our existing statistics webpages](https://commoncrawl.github.io/cc-crawl-statistics/) ([GitHub repo](https://github.com/commoncrawl/cc-crawl-statistics)) | |
- [AWS infrastructure status page](https://status.commoncrawl.org/) | |
### Datasets | |
Explore our datasets hosted on Hugging Face: | |
- [Common Crawl Citations](https://huggingface.co/datasets/commoncrawl/citations) | |
- [Common Crawl Citations, Annotated](https://huggingface.co/datasets/commoncrawl/citations-annotated) | |
- [Common Crawl Statistics](https://huggingface.co/datasets/commoncrawl/statistics) | |
- [EOT 2024 Host-Level Logs](https://huggingface.co/datasets/commoncrawl/eot2024_hostlevel_logs) (only available to EOT collaborators) | |
We look forward to supporting the research and development community with these resources. |