Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
BEEspoke Data
community
AI & ML interests
'an LLM is only as good as the dataset it was trained on' - Sun Tzu
Recent Activity
Organization Card
ššš
š§"raw" pretrained smol_llama checkpoints - WIP š§
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation ⢠0.1B ⢠Updated ⢠1.98k ⢠30 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation ⢠81.3M ⢠Updated ⢠668 ⢠9 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation ⢠0.2B ⢠Updated ⢠1.94k ⢠13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation ⢠58.1M ⢠Updated ⢠618 ⢠4
Books from the Survivor Library (mostly ~1920s & earlier) OCR'd with recent VLMs
š§"raw" pretrained smol_llama checkpoints - WIP š§
-
BEE-spoke-data/smol_llama-101M-GQA
Text Generation ⢠0.1B ⢠Updated ⢠1.98k ⢠30 -
BEE-spoke-data/smol_llama-81M-tied
Text Generation ⢠81.3M ⢠Updated ⢠668 ⢠9 -
BEE-spoke-data/smol_llama-220M-GQA
Text Generation ⢠0.2B ⢠Updated ⢠1.94k ⢠13 -
BEE-spoke-data/verysmol_llama-v11-KIx2
Text Generation ⢠58.1M ⢠Updated ⢠618 ⢠4
models
57
BEE-spoke-data/neobert-100k-test
Fill-Mask
ā¢
0.1B
ā¢
Updated
ā¢
6
BEE-spoke-data/tiny-random-MPNetForMaskedLM
Fill-Mask
ā¢
237k
ā¢
Updated
ā¢
4
BEE-spoke-data/wordpiece-tokenizer-32k-en_code-msp
Updated
BEE-spoke-data/wordpiece-tokenizer-32k-en_code-orig
Updated
BEE-spoke-data/bpe-tokenizer-32k-smolNeoX
Updated
BEE-spoke-data/pegasus-x-base-synthsumm_open-16k
Summarization
ā¢
0.3B
ā¢
Updated
ā¢
49
ā¢
2
BEE-spoke-data/tFINE-680m-e32-d16-gqa-flan
0.7B
ā¢
Updated
ā¢
2
BEE-spoke-data/tFINE-680m-e32-d16-infinity_instruct-L2
Text Generation
ā¢
0.7B
ā¢
Updated
ā¢
4
BEE-spoke-data/tFINE-900m-e16-d32-instruct_2e
0.9B
ā¢
Updated
ā¢
12
BEE-spoke-data/tFINE-900m-instruct-orpo
0.9B
ā¢
Updated
ā¢
4
datasets
82
BEE-spoke-data/govdocs1-pdf-source
Viewer
ā¢
Updated
ā¢
235k
ā¢
4.4k
ā¢
2
BEE-spoke-data/govdocs1-by-extension
Viewer
ā¢
Updated
ā¢
733k
ā¢
166
ā¢
2
BEE-spoke-data/SurvivorLib-Nanonets-OCR-s
Viewer
ā¢
Updated
ā¢
11.7k
ā¢
246
ā¢
3
BEE-spoke-data/SurvivorLib-rolmOCR
Viewer
ā¢
Updated
ā¢
13.3k
ā¢
155
ā¢
2
BEE-spoke-data/napierone-pdf-nanonets-s
Viewer
ā¢
Updated
ā¢
9.96k
ā¢
16
BEE-spoke-data/napierone-pdf-olmOCR
Viewer
ā¢
Updated
ā¢
19k
ā¢
27
BEE-spoke-data/LONGCOT-merged-1M
Viewer
ā¢
Updated
ā¢
1.7M
ā¢
37
ā¢
2
BEE-spoke-data/cosmopedia-v2-mincols
Viewer
ā¢
Updated
ā¢
39.1M
ā¢
46
ā¢
1
BEE-spoke-data/reddit-title-body-hf
Viewer
ā¢
Updated
ā¢
251M
ā¢
14
ā¢
4
BEE-spoke-data/bigpatent-all
Viewer
ā¢
Updated
ā¢
2.43M
ā¢
1.3k