ULTRATHINK
π Production-ready training framework for advanced Large Language Models
Quick Start β’ Features β’ Documentation β’ Benchmarks β’ Comparisons β’ Roadmap β’ Contributing
ULTRATHINK provides a complete, modular stack for training custom LLMs with state-of-the-art architectures, distributed training, and comprehensive monitoring.
π― Why ULTRATHINK?
Train state-of-the-art LLMs in 10 lines of code - From prototype to production in minutes, not days.
python train_ultrathink.py \
--dataset c4 --streaming \
--hidden_size 768 --num_layers 12 \
--enable_moe --enable_dre \
--use_amp --gradient_checkpointing
π What Makes Us Different
| Feature | ULTRATHINK | Others |
|---|---|---|
| Setup Time | β‘ 5 minutes | 30-120 minutes |
| Lines to Train | π ~10 | 50-100+ |
| MoE Support | β Native | β or Limited |
| Dynamic Reasoning | β Unique | β None |
| Constitutional AI | β Built-in | β None |
| Documentation | π Comprehensive | Varies |
β¨ Key Features
- ποΈ Modern Architecture - GQA, RoPE, SwiGLU, Flash Attention, RMSNorm
- π§ Advanced Components - Mixture-of-Experts, Dynamic Reasoning Engine, Constitutional AI
- π Production Monitoring - MLflow, W&B, TensorBoard integration
- β‘ Optimized Training - DeepSpeed ZeRO, FSDP, gradient checkpointing, AMP
- π§ͺ Fully Tested - Unit & integration tests with pytest
- π³ Docker Support - Ready-to-use containers for training and inference
- π Complete Docs - Step-by-step guides for all experience levels
View benchmarks and performance metrics β
π Quick Start
Installation
# Clone repository
git clone https://github.com/vediyappanm/UltraThinking-LLM-Training.git
cd UltraThinking-LLM-Training/deep
# Install dependencies
pip install -r requirements.txt
Training Examples
Tiny Model (CPU-friendly, for testing):
python train_ultrathink.py \
--dataset wikitext \
--hidden_size 256 --num_layers 2 --num_heads 4 \
--batch_size 2 --max_samples 1000 \
--num_epochs 1
Small Model (GPU recommended):
python train_advanced.py --config configs/train_small.yaml
With Advanced Features:
python train_ultrathink.py \
--dataset c4 --streaming \
--hidden_size 768 --num_layers 12 --num_heads 12 \
--enable_moe --enable_dre --enable_constitutional \
--use_amp --gradient_checkpointing \
--use_mlflow
Docker
# Run Gradio web interface
docker compose up
# Or build and run manually
docker build -t ultrathink:latest .
docker run -p 7860:7860 ultrathink:latest
Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=src --cov-report=html
# Quick smoke test
python tests/smoke_test.py
π Documentation
π Getting Started
- Training Quickstart - Get started in 5 minutes
- Advanced Training Guide - Deep dive into all features
- Troubleshooting - Common issues and solutions
- Google Colab - Train in the cloud for free
π Performance & Comparisons
- Benchmarks - Performance metrics and results
- Framework Comparison - vs GPT-NeoX, Megatron-LM, Axolotl
- Model Card - Model specifications
ποΈ Architecture & Development
- Architecture Overview - Visual system diagrams
- Project Structure - Understanding the codebase
- Roadmap - Future plans and features
π Training Guides
- Small Models - Train on limited hardware
- DeepSpeed Integration - Distributed training setup
- Dataset Configuration - Using custom datasets
π€ Community
- Contributing - Contribution guidelines
- Code of Conduct - Community standards
- Changelog - Version history
π Project Structure
deep/
βββ train_ultrathink.py # Main training script
βββ train_advanced.py # YAML config-based training
βββ app_gradio.py # Web UI for inference
βββ src/
β βββ models/ # UltraThink, MoE, DRE, architecture
β βββ data/ # Datasets, tokenization, validation
β βββ training/ # Optimizers, distributed, RLHF
β βββ monitoring/ # Metrics and system monitoring
β βββ security/ # Input validation and safety
β βββ evaluation/ # Benchmarks and metrics
βββ tests/ # Unit and integration tests
βββ configs/ # YAML configuration files
βββ scripts/ # Utilities (profiling, inference)
βββ docs/ # Documentation and guides
See PROJECT_STRUCTURE.md for detailed explanations.
π₯ Training Examples
Small Dataset Training
# WikiText-2 (fast iteration)
python train_ultrathink.py \
--dataset wikitext \
--hidden_size 512 --num_layers 6 --num_heads 8 \
--batch_size 4 --num_epochs 3 \
--use_mlflow
Production Training (C4 Dataset)
# Streaming C4 with all optimizations
python train_ultrathink.py \
--dataset c4 --dataset_subset en --streaming \
--hidden_size 768 --num_layers 12 --num_heads 12 \
--batch_size 2 --gradient_accumulation_steps 64 \
--learning_rate 3e-4 --warmup_steps 5000 \
--use_amp --gradient_checkpointing \
--max_seq_length 1024 \
--output_dir ./outputs/c4_production
Using Configuration Files
# Small model (4-8GB GPU)
python train_advanced.py --config configs/train_small.yaml
# Medium model (16-32GB GPU)
python train_advanced.py --config configs/train_medium.yaml
# Large model (40GB+ GPU)
python train_advanced.py --config configs/train_large.yaml
π³ Docker Usage
Web Interface (Gradio):
docker compose up
# Visit http://localhost:7860
Custom Training:
docker run -v $(pwd)/outputs:/app/outputs ultrathink:latest \
python train_ultrathink.py \
--dataset wikitext \
--hidden_size 256 --num_layers 2 \
--output_dir /app/outputs/my_model
GPU Training:
docker run --gpus all \
-v $(pwd)/outputs:/app/outputs \
ultrathink:latest \
python train_ultrathink.py --use_amp
π€ Contributing
We welcome contributions! Please see:
- CONTRIBUTING.md - Guidelines and setup
- CODE_OF_CONDUCT.md - Community standards
- Roadmap - See what we're building next
π Star History
If you find ULTRATHINK useful, please consider giving us a star! β
π Model Specifications
| Size | Parameters | Layers | Hidden | Context | Min GPU |
|---|---|---|---|---|---|
| Tiny | 125M | 12 | 768 | 2048 | 6GB |
| Small | 350M | 24 | 1024 | 4096 | 16GB |
| Medium | 760M | 24 | 1536 | 4096 | 24GB |
| Large | 1.3B | 32 | 2048 | 8192 | 40GB |
See MODEL_CARD.md for complete specifications.
π License
MIT License - see LICENSE for details.
π Citation
If you use ULTRATHINK in your research or project, please cite:
@software{ultrathink2025,
title={ULTRATHINK: Advanced LLM Training Framework with Mixture-of-Experts and Dynamic Reasoning},
author={ULTRATHINK Team},
year={2025},
url={https://github.com/vediyappanm/UltraThinking-LLM-Training},
version={1.0.0}
}
π Community & Support
π¬ Get Help
- GitHub Discussions - Ask questions, share ideas
- Issue Tracker - Report bugs, request features
- Troubleshooting Guide - Common issues and solutions
- FAQ - Frequently asked questions
π Share Your Work
Built something cool with ULTRATHINK? We'd love to hear about it!
- Open a discussion to share your project
- Submit a PR to add your model to our showcase
- Tweet about it and tag us
π’ Stay Updated
- β Star this repo to get notifications
- π Watch releases for new features
- π¦ Follow on Twitter for updates
Made with β€οΈ by the ULTRATHINK Team