YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

ULTRATHINK

ULTRATHINK Logo

🚀 Production-ready training framework for advanced Large Language Models

Quick Start • Features • Documentation • Benchmarks • Comparisons • Roadmap • Contributing

ULTRATHINK provides a complete, modular stack for training custom LLMs with state-of-the-art architectures, distributed training, and comprehensive monitoring.

🎯 Why ULTRATHINK?

Train state-of-the-art LLMs in 10 lines of code - From prototype to production in minutes, not days.

python train_ultrathink.py \
  --dataset c4 --streaming \
  --hidden_size 768 --num_layers 12 \
  --enable_moe --enable_dre \
  --use_amp --gradient_checkpointing

🏆 What Makes Us Different

Feature	ULTRATHINK	Others
Setup Time	⚡ 5 minutes	30-120 minutes
Lines to Train	📝 ~10	50-100+
MoE Support	✅ Native	❌ or Limited
Dynamic Reasoning	✅ Unique	❌ None
Constitutional AI	✅ Built-in	❌ None
Documentation	📚 Comprehensive	Varies

See detailed comparison →

✨ Key Features

🏗️ Modern Architecture - GQA, RoPE, SwiGLU, Flash Attention, RMSNorm
🧠 Advanced Components - Mixture-of-Experts, Dynamic Reasoning Engine, Constitutional AI
📊 Production Monitoring - MLflow, W&B, TensorBoard integration
⚡ Optimized Training - DeepSpeed ZeRO, FSDP, gradient checkpointing, AMP
🧪 Fully Tested - Unit & integration tests with pytest
🐳 Docker Support - Ready-to-use containers for training and inference
📚 Complete Docs - Step-by-step guides for all experience levels

View benchmarks and performance metrics →

🚀 Quick Start

Installation

# Clone repository
git clone https://github.com/vediyappanm/UltraThinking-LLM-Training.git
cd UltraThinking-LLM-Training/deep

# Install dependencies
pip install -r requirements.txt

Training Examples

Tiny Model (CPU-friendly, for testing):

python train_ultrathink.py \
  --dataset wikitext \
  --hidden_size 256 --num_layers 2 --num_heads 4 \
  --batch_size 2 --max_samples 1000 \
  --num_epochs 1

Small Model (GPU recommended):

python train_advanced.py --config configs/train_small.yaml

With Advanced Features:

python train_ultrathink.py \
  --dataset c4 --streaming \
  --hidden_size 768 --num_layers 12 --num_heads 12 \
  --enable_moe --enable_dre --enable_constitutional \
  --use_amp --gradient_checkpointing \
  --use_mlflow

Docker

# Run Gradio web interface
docker compose up

# Or build and run manually
docker build -t ultrathink:latest .
docker run -p 7860:7860 ultrathink:latest

Testing

# Run all tests
pytest

# Run with coverage
pytest --cov=src --cov-report=html

# Quick smoke test
python tests/smoke_test.py

📚 Documentation

🚀 Getting Started

Training Quickstart - Get started in 5 minutes
Advanced Training Guide - Deep dive into all features
Troubleshooting - Common issues and solutions
Google Colab - Train in the cloud for free

📊 Performance & Comparisons

Benchmarks - Performance metrics and results
Framework Comparison - vs GPT-NeoX, Megatron-LM, Axolotl
Model Card - Model specifications

🏗️ Architecture & Development

Architecture Overview - Visual system diagrams
Project Structure - Understanding the codebase
Roadmap - Future plans and features

📖 Training Guides

Small Models - Train on limited hardware
DeepSpeed Integration - Distributed training setup
Dataset Configuration - Using custom datasets

🤝 Community

Contributing - Contribution guidelines
Code of Conduct - Community standards
Changelog - Version history

📖 Full Documentation Index

📁 Project Structure

deep/
├── train_ultrathink.py        # Main training script
├── train_advanced.py          # YAML config-based training
├── app_gradio.py              # Web UI for inference
├── src/
│   ├── models/               # UltraThink, MoE, DRE, architecture
│   ├── data/                 # Datasets, tokenization, validation
│   ├── training/             # Optimizers, distributed, RLHF
│   ├── monitoring/           # Metrics and system monitoring
│   ├── security/             # Input validation and safety
│   └── evaluation/           # Benchmarks and metrics
├── tests/                    # Unit and integration tests
├── configs/                  # YAML configuration files
├── scripts/                  # Utilities (profiling, inference)
└── docs/                     # Documentation and guides

See PROJECT_STRUCTURE.md for detailed explanations.

🔥 Training Examples

Small Dataset Training

# WikiText-2 (fast iteration)
python train_ultrathink.py \
  --dataset wikitext \
  --hidden_size 512 --num_layers 6 --num_heads 8 \
  --batch_size 4 --num_epochs 3 \
  --use_mlflow

Production Training (C4 Dataset)

# Streaming C4 with all optimizations
python train_ultrathink.py \
  --dataset c4 --dataset_subset en --streaming \
  --hidden_size 768 --num_layers 12 --num_heads 12 \
  --batch_size 2 --gradient_accumulation_steps 64 \
  --learning_rate 3e-4 --warmup_steps 5000 \
  --use_amp --gradient_checkpointing \
  --max_seq_length 1024 \
  --output_dir ./outputs/c4_production

Using Configuration Files

# Small model (4-8GB GPU)
python train_advanced.py --config configs/train_small.yaml

# Medium model (16-32GB GPU)
python train_advanced.py --config configs/train_medium.yaml

# Large model (40GB+ GPU)
python train_advanced.py --config configs/train_large.yaml

🐳 Docker Usage

Web Interface (Gradio):

docker compose up
# Visit http://localhost:7860

Custom Training:

docker run -v $(pwd)/outputs:/app/outputs ultrathink:latest \
  python train_ultrathink.py \
    --dataset wikitext \
    --hidden_size 256 --num_layers 2 \
    --output_dir /app/outputs/my_model

GPU Training:

docker run --gpus all \
  -v $(pwd)/outputs:/app/outputs \
  ultrathink:latest \
  python train_ultrathink.py --use_amp

🤝 Contributing

We welcome contributions! Please see:

CONTRIBUTING.md - Guidelines and setup
CODE_OF_CONDUCT.md - Community standards
Roadmap - See what we're building next

🌟 Star History

If you find ULTRATHINK useful, please consider giving us a star! ⭐

📊 Model Specifications

Size	Parameters	Layers	Hidden	Context	Min GPU
Tiny	125M	12	768	2048	6GB
Small	350M	24	1024	4096	16GB
Medium	760M	24	1536	4096	24GB
Large	1.3B	32	2048	8192	40GB

See MODEL_CARD.md for complete specifications.

📄 License

MIT License - see LICENSE for details.

🙏 Citation

If you use ULTRATHINK in your research or project, please cite:

@software{ultrathink2025,
  title={ULTRATHINK: Advanced LLM Training Framework with Mixture-of-Experts and Dynamic Reasoning},
  author={ULTRATHINK Team},
  year={2025},
  url={https://github.com/vediyappanm/UltraThinking-LLM-Training},
  version={1.0.0}
}

🌐 Community & Support

💬 Get Help

GitHub Discussions - Ask questions, share ideas
Issue Tracker - Report bugs, request features
Troubleshooting Guide - Common issues and solutions
FAQ - Frequently asked questions

🚀 Share Your Work

Built something cool with ULTRATHINK? We'd love to hear about it!

Open a discussion to share your project
Submit a PR to add your model to our showcase
Tweet about it and tag us

📢 Stay Updated

⭐ Star this repo to get notifications
👀 Watch releases for new features
🐦 Follow on Twitter for updates

Made with ❤️ by the ULTRATHINK Team

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support