uv scripts for HF Jobs
AI & ML interests
None defined yet.
Recent Activity
UV Scripts
Ready-to-run ML tools powered by UV - zero setup, maximum power
Run state-of-the-art ML workflows with a single command. From OCR to classification, all scripts work instantly with uv run.
What are UV scripts?
UV scripts are self-contained Python scripts that use inline metadata to specify dependencies. Just uv run script.py and everything installs automatically.
Perfect for:
- 🚀 GPU workflows on HF Jobs
- 💻 Local processing on your machine
- 🔄 Reproducible pipelines that work anywhere
🚀 Quick Example
# Extract text from images with state-of-the-art OCR (no local GPU needed!)
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
  your-images your-extracted-text
📚 Browse Scripts
| Script Collection | Description | GPU Required | 
|---|---|---|
| ocr | Extract text from images with VLMs (LaTeX, tables, forms) | ✅ | 
| classification | Text classification with guaranteed valid outputs | ✅ | 
| dataset-creation | Create datasets from PDFs and files | ❌ | 
| vllm | High-performance inference with vLLM | ✅ | 
| synthetic-data | Generate high-quality synthetic data with CoT reasoning | ✅ | 
| deduplication | Remove duplicates using semantic similarity | ❌ | 
| openai-oss | Generate responses with visible reasoning traces | ✅ | 
🎯 Why UV Scripts?
Zero Setup
No virtual environments, no dependency conflicts, no installation steps. UV handles everything automatically when you run the script.
GPU Optimized
Seamlessly run on local GPUs or scale to cloud with HF Jobs. Same script, different compute.
🌟 Featured Scripts
OCR Any Document Dataset
Extract text from images with state-of-the-art accuracy:
# Handles LaTeX, tables, forms, handwriting
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/ocr/raw/main/nanonets-ocr.py \
  your-images extracted-text
Deduplicate Datasets (CPU-Friendly!)
Remove duplicates using semantic similarity - no GPU needed:
# Fast semantic deduplication on CPU
uv run https://huggingface.co/datasets/uv-scripts/deduplication/raw/main/semantic-dedupe.py \
  your-dataset text your-dataset-clean \
  --method duplicates --threshold 0.9
Generate Synthetic Training Data
Create high-quality synthetic data with chain-of-thought reasoning:
# Generate synthetic math problems with reasoning
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/synthetic-data/raw/main/cot-self-instruct.py \
  --seed-dataset math-examples --output-dataset synthetic-math \
  --task-type reasoning --num-samples 1000
🚀 Getting Started with HF Jobs
Run any UV script on GPU infrastructure:
hf jobs uv run --flavor l4x1 \
  https://huggingface.co/datasets/uv-scripts/[collection]/raw/main/[script].py \
  [args]
Choose your GPU flavor:
- l4x1- Good balance for most tasks
- a10g-large- More memory for larger models
- a100-large- Maximum performance
📖 Learn More
UV Scripts is a community project showcasing the power of UV for ML workflows.
