recomendation / QUICK_START_TRAINING.md
Ali Mohsin
more try
8bcf79a
|
raw
history blame
6 kB

πŸš€ Quick Start: Advanced Training Interface

Overview

The Dressify system now provides comprehensive parameter control for both ResNet and ViT training directly from the Gradio interface. You can tweak every aspect of model training without editing code!

🎯 What You Can Control

ResNet Item Embedder

  • Architecture: Backbone (ResNet50/101), embedding dimension, dropout
  • Training: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
  • Hardware: Mixed precision, memory format, gradient clipping

ViT Outfit Encoder

  • Architecture: Transformer layers, attention heads, feed-forward multiplier, dropout
  • Training: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
  • Strategy: Mining strategy, augmentation level, random seed

Advanced Settings

  • Learning Rate: Warmup epochs, scheduler type, early stopping patience
  • Optimization: Mixed precision, channels-last memory, gradient clipping
  • Reproducibility: Random seed, deterministic training

πŸš€ Quick Start Steps

1. Launch the App

python app.py

2. Go to Advanced Training Tab

  • Click on the "πŸ”¬ Advanced Training" tab
  • You'll see comprehensive parameter controls organized in sections

3. Choose Your Training Mode

Quick Training (Basic)

  • Set ResNet epochs: 5-10
  • Set ViT epochs: 10-20
  • Click "πŸš€ Start Quick Training"

Advanced Training (Custom)

  • Adjust all parameters to your liking
  • Click "🎯 Start Advanced Training"

4. Monitor Progress

  • Watch the training log for real-time updates
  • Check the Status tab for system health
  • Download models from the Downloads tab when complete

πŸ”¬ Parameter Tuning Examples

Fast Experimentation

# Quick test (5-10 minutes)
ResNet: epochs=5, batch_size=16, lr=1e-3
ViT: epochs=10, batch_size=16, lr=5e-4

Standard Training

# Balanced quality (1-2 hours)
ResNet: epochs=20, batch_size=64, lr=1e-3
ViT: epochs=30, batch_size=32, lr=5e-4

High Quality Training

# Production models (4-6 hours)
ResNet: epochs=50, batch_size=32, lr=5e-4
ViT: epochs=100, batch_size=16, lr=1e-4

Research Experiments

# Maximum capacity
ResNet: backbone=resnet101, embedding_dim=768
ViT: layers=8, heads=12, mining_strategy=hardest

🎯 Key Parameters to Experiment With

High Impact (Try First)

  1. Learning Rate: 1e-4 to 1e-2
  2. Batch Size: 16 to 128
  3. Triplet Margin: 0.1 to 0.5
  4. Epochs: 5 to 100

Medium Impact

  1. Embedding Dimension: 256, 512, 768, 1024
  2. Transformer Layers: 4, 6, 8, 12
  3. Optimizer: AdamW, Adam, SGD, RMSprop

Fine-tuning

  1. Weight Decay: 1e-6 to 1e-1
  2. Dropout: 0.0 to 0.5
  3. Attention Heads: 4, 8, 16

πŸ“Š Training Workflow

1. Start Simple πŸš€

  • Use default parameters first
  • Run quick training (5-10 epochs)
  • Verify system works

2. Experiment Systematically πŸ”

  • Change one parameter at a time
  • Start with learning rate and batch size
  • Document every change

3. Validate Results βœ…

  • Compare training curves
  • Check validation metrics
  • Ensure improvements are consistent

4. Scale Up πŸ“ˆ

  • Use best parameters for longer training
  • Increase epochs gradually
  • Monitor for overfitting

πŸ§ͺ Monitoring Training

What to Watch

  • Training Loss: Should decrease steadily
  • Validation Loss: Should decrease without overfitting
  • Training Time: Per epoch timing
  • GPU Memory: VRAM usage

Success Signs

  • Smooth loss curves
  • Consistent improvement
  • Good generalization

Warning Signs

  • Loss spikes or plateaus
  • Validation loss increases
  • Training becomes unstable

πŸ”§ Advanced Features

Mixed Precision Training

  • Enable: Faster training, less memory
  • Disable: More stable, higher precision
  • Default: Enabled (recommended)

Triplet Mining Strategies

  • Semi-hard: Balanced difficulty (default)
  • Hardest: Maximum challenge
  • Random: Simple but less effective

Data Augmentation

  • Minimal: Basic transforms
  • Standard: Balanced augmentation (default)
  • Aggressive: Heavy augmentation

πŸ“ Best Practices

1. Document Everything πŸ“š

  • Save parameter combinations
  • Record training results
  • Note hardware specifications

2. Start Small πŸ”¬

  • Test with few epochs first
  • Validate promising combinations
  • Scale up gradually

3. Monitor Resources πŸ’»

  • Watch GPU memory usage
  • Check training time per epoch
  • Balance quality vs. speed

4. Save Checkpoints πŸ’Ύ

  • Models are saved automatically
  • Keep intermediate checkpoints
  • Download final models

🚨 Common Issues & Solutions

Training Too Slow

  • Reduce batch size
  • Increase learning rate
  • Use mixed precision
  • Reduce embedding dimension

Training Unstable

  • Reduce learning rate
  • Increase batch size
  • Enable gradient clipping
  • Check data quality

Out of Memory

  • Reduce batch size
  • Reduce embedding dimension
  • Use mixed precision
  • Reduce transformer layers

Poor Results

  • Increase epochs
  • Adjust learning rate
  • Try different optimizers
  • Check data preprocessing

πŸ“š Next Steps

1. Read the Full Guide

  • See TRAINING_PARAMETERS.md for detailed explanations
  • Understand parameter impact and trade-offs

2. Run Experiments

  • Start with quick training
  • Experiment with different parameters
  • Document your findings

3. Optimize for Your Use Case

  • Balance quality vs. speed
  • Consider hardware constraints
  • Aim for reproducible results

4. Share Results

  • Document successful configurations
  • Share insights with the community
  • Contribute to best practices

πŸŽ‰ You're ready to start experimenting!

Remember: Start simple, change one thing at a time, and document everything. Happy training! πŸš€