metadata
language: en
tags:
- tokenizer
- pytorch
- streaming
library_name: nano
pipeline_tag: token-classification
datasets:
- Salesforce/wikitext
Nano Tokenizer
This tokenizer was trained using a Python-only pipeline (no transformers
or tokenizers
), on a dataset streamed from the Hugging Face Hub.
Usage
from transformers import PreTrainedTokenizerFast
tokenizer = PreTrainedTokenizerFast.from_pretrained("goabonga/wikitext-2-raw-v1")