YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ULTRATHINK

ULTRATHINK Logo

πŸš€ Production-ready training framework for advanced Large Language Models

Open In Colab CI Status Python 3.9+ License: MIT GitHub stars

PyTorch Hugging Face Docker Issues Pull Requests

Quick Start β€’ Features β€’ Documentation β€’ Benchmarks β€’ Comparisons β€’ Roadmap β€’ Contributing


ULTRATHINK provides a complete, modular stack for training custom LLMs with state-of-the-art architectures, distributed training, and comprehensive monitoring.

🎯 Why ULTRATHINK?

Train state-of-the-art LLMs in 10 lines of code - From prototype to production in minutes, not days.

python train_ultrathink.py \
  --dataset c4 --streaming \
  --hidden_size 768 --num_layers 12 \
  --enable_moe --enable_dre \
  --use_amp --gradient_checkpointing

πŸ† What Makes Us Different

Feature ULTRATHINK Others
Setup Time ⚑ 5 minutes 30-120 minutes
Lines to Train πŸ“ ~10 50-100+
MoE Support βœ… Native ❌ or Limited
Dynamic Reasoning βœ… Unique ❌ None
Constitutional AI βœ… Built-in ❌ None
Documentation πŸ“š Comprehensive Varies

See detailed comparison β†’

✨ Key Features

  • πŸ—οΈ Modern Architecture - GQA, RoPE, SwiGLU, Flash Attention, RMSNorm
  • 🧠 Advanced Components - Mixture-of-Experts, Dynamic Reasoning Engine, Constitutional AI
  • πŸ“Š Production Monitoring - MLflow, W&B, TensorBoard integration
  • ⚑ Optimized Training - DeepSpeed ZeRO, FSDP, gradient checkpointing, AMP
  • πŸ§ͺ Fully Tested - Unit & integration tests with pytest
  • 🐳 Docker Support - Ready-to-use containers for training and inference
  • πŸ“š Complete Docs - Step-by-step guides for all experience levels

View benchmarks and performance metrics β†’

πŸš€ Quick Start

Installation

# Clone repository
git clone https://github.com/vediyappanm/UltraThinking-LLM-Training.git
cd UltraThinking-LLM-Training/deep

# Install dependencies
pip install -r requirements.txt

Training Examples

Tiny Model (CPU-friendly, for testing):

python train_ultrathink.py \
  --dataset wikitext \
  --hidden_size 256 --num_layers 2 --num_heads 4 \
  --batch_size 2 --max_samples 1000 \
  --num_epochs 1

Small Model (GPU recommended):

python train_advanced.py --config configs/train_small.yaml

With Advanced Features:

python train_ultrathink.py \
  --dataset c4 --streaming \
  --hidden_size 768 --num_layers 12 --num_heads 12 \
  --enable_moe --enable_dre --enable_constitutional \
  --use_amp --gradient_checkpointing \
  --use_mlflow

Docker

# Run Gradio web interface
docker compose up

# Or build and run manually
docker build -t ultrathink:latest .
docker run -p 7860:7860 ultrathink:latest

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Quick smoke test
python tests/smoke_test.py

πŸ“š Documentation

πŸš€ Getting Started

πŸ“Š Performance & Comparisons

πŸ—οΈ Architecture & Development

πŸ“– Training Guides

🀝 Community

πŸ“– Full Documentation Index

πŸ“ Project Structure

deep/
β”œβ”€β”€ train_ultrathink.py        # Main training script
β”œβ”€β”€ train_advanced.py          # YAML config-based training
β”œβ”€β”€ app_gradio.py              # Web UI for inference
β”œβ”€β”€ src/
β”‚   β”œβ”€β”€ models/               # UltraThink, MoE, DRE, architecture
β”‚   β”œβ”€β”€ data/                 # Datasets, tokenization, validation
β”‚   β”œβ”€β”€ training/             # Optimizers, distributed, RLHF
β”‚   β”œβ”€β”€ monitoring/           # Metrics and system monitoring
β”‚   β”œβ”€β”€ security/             # Input validation and safety
β”‚   └── evaluation/           # Benchmarks and metrics
β”œβ”€β”€ tests/                    # Unit and integration tests
β”œβ”€β”€ configs/                  # YAML configuration files
β”œβ”€β”€ scripts/                  # Utilities (profiling, inference)
└── docs/                     # Documentation and guides

See PROJECT_STRUCTURE.md for detailed explanations.

πŸ”₯ Training Examples

Small Dataset Training

# WikiText-2 (fast iteration)
python train_ultrathink.py \
  --dataset wikitext \
  --hidden_size 512 --num_layers 6 --num_heads 8 \
  --batch_size 4 --num_epochs 3 \
  --use_mlflow

Production Training (C4 Dataset)

# Streaming C4 with all optimizations
python train_ultrathink.py \
  --dataset c4 --dataset_subset en --streaming \
  --hidden_size 768 --num_layers 12 --num_heads 12 \
  --batch_size 2 --gradient_accumulation_steps 64 \
  --learning_rate 3e-4 --warmup_steps 5000 \
  --use_amp --gradient_checkpointing \
  --max_seq_length 1024 \
  --output_dir ./outputs/c4_production

Using Configuration Files

# Small model (4-8GB GPU)
python train_advanced.py --config configs/train_small.yaml

# Medium model (16-32GB GPU)
python train_advanced.py --config configs/train_medium.yaml

# Large model (40GB+ GPU)
python train_advanced.py --config configs/train_large.yaml

🐳 Docker Usage

Web Interface (Gradio):

docker compose up
# Visit http://localhost:7860

Custom Training:

docker run -v $(pwd)/outputs:/app/outputs ultrathink:latest \
  python train_ultrathink.py \
    --dataset wikitext \
    --hidden_size 256 --num_layers 2 \
    --output_dir /app/outputs/my_model

GPU Training:

docker run --gpus all \
  -v $(pwd)/outputs:/app/outputs \
  ultrathink:latest \
  python train_ultrathink.py --use_amp

🀝 Contributing

We welcome contributions! Please see:

🌟 Star History

If you find ULTRATHINK useful, please consider giving us a star! ⭐

Star History Chart

πŸ“Š Model Specifications

Size Parameters Layers Hidden Context Min GPU
Tiny 125M 12 768 2048 6GB
Small 350M 24 1024 4096 16GB
Medium 760M 24 1536 4096 24GB
Large 1.3B 32 2048 8192 40GB

See MODEL_CARD.md for complete specifications.

πŸ“„ License

MIT License - see LICENSE for details.

πŸ™ Citation

If you use ULTRATHINK in your research or project, please cite:

@software{ultrathink2025,
  title={ULTRATHINK: Advanced LLM Training Framework with Mixture-of-Experts and Dynamic Reasoning},
  author={ULTRATHINK Team},
  year={2025},
  url={https://github.com/vediyappanm/UltraThinking-LLM-Training},
  version={1.0.0}
}

🌐 Community & Support

Discussions Issues Twitter

πŸ’¬ Get Help

πŸš€ Share Your Work

Built something cool with ULTRATHINK? We'd love to hear about it!

  • Open a discussion to share your project
  • Submit a PR to add your model to our showcase
  • Tweet about it and tag us

πŸ“’ Stay Updated

  • ⭐ Star this repo to get notifications
  • πŸ‘€ Watch releases for new features
  • 🐦 Follow on Twitter for updates

Made with ❀️ by the ULTRATHINK Team

Back to Top ↑

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support