A newer version of the Gradio SDK is available:
5.49.0
title: Chunking
emoji: 🐠
colorFrom: pink
colorTo: pink
sdk: gradio
sdk_version: 4.0.2
app_file: app.py
pinned: false
license: apache-2.0
Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference
#LLMlocal's Text Tokenization Tool
##Introduction
LLMlocal's Text Tokenization Tool is a user-friendly application designed to tokenize large bodies of text using various methods. This tool is built using Gradio, a Python library that allows the easy creation of web interfaces for machine learning models.
##Features Tokenization Method Selection: Users can select different text tokenization methods. The current version supports the RecursiveCharacterTextSplitter method. Customizable Parameters: Users have the flexibility to set parameters like chunk size, chunk overlap, and the number of chunks to display. Interactive Interface: The tool features an intuitive interface with dropdowns, textboxes, and number inputs for easy interaction. Installation Before using the tool, ensure you have Python and the necessary packages installed. You can install the required packages using the following command:
bash' pip install gradio pandas langchain'
##Usage To use the tool, follow these simple steps:
Launch the Tool: Run the provided Python script. This will launch the Gradio interface in your default web browser. Select a Tokenization Method: Choose the desired method from the dropdown menu. Enter Text: Type or paste the text you want to tokenize in the textbox. Set Parameters: Adjust the chunk size, chunk overlap, and the number of chunks as per your requirements. View Results: The tokenized text will be displayed in a table format with details such as chunk number, text chunk, character count, and token count. Contributing Feedback and contributions to enhance the tool are always welcome. Please feel free to raise issues or submit pull requests on the repository.
##License This tool is open-source and available under apache-2.0 .
Acknowledgments Thanks to the developers of Gradio, Pandas, and Langchain for providing the libraries that made this tool possible.