Spaces:
Paused
Paused
π Quick Start: Advanced Training Interface
Overview
The Dressify system now provides comprehensive parameter control for both ResNet and ViT training directly from the Gradio interface. You can tweak every aspect of model training without editing code!
π― What You Can Control
ResNet Item Embedder
- Architecture: Backbone (ResNet50/101), embedding dimension, dropout
- Training: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
- Hardware: Mixed precision, memory format, gradient clipping
ViT Outfit Encoder
- Architecture: Transformer layers, attention heads, feed-forward multiplier, dropout
- Training: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
- Strategy: Mining strategy, augmentation level, random seed
Advanced Settings
- Learning Rate: Warmup epochs, scheduler type, early stopping patience
- Optimization: Mixed precision, channels-last memory, gradient clipping
- Reproducibility: Random seed, deterministic training
π Quick Start Steps
1. Launch the App
python app.py
2. Go to Advanced Training Tab
- Click on the "π¬ Advanced Training" tab
- You'll see comprehensive parameter controls organized in sections
3. Choose Your Training Mode
Quick Training (Basic)
- Set ResNet epochs: 5-10
- Set ViT epochs: 10-20
- Click "π Start Quick Training"
Advanced Training (Custom)
- Adjust all parameters to your liking
- Click "π― Start Advanced Training"
4. Monitor Progress
- Watch the training log for real-time updates
- Check the Status tab for system health
- Download models from the Downloads tab when complete
π¬ Parameter Tuning Examples
Fast Experimentation
# Quick test (5-10 minutes)
ResNet: epochs=5, batch_size=16, lr=1e-3
ViT: epochs=10, batch_size=16, lr=5e-4
Standard Training
# Balanced quality (1-2 hours)
ResNet: epochs=20, batch_size=64, lr=1e-3
ViT: epochs=30, batch_size=32, lr=5e-4
High Quality Training
# Production models (4-6 hours)
ResNet: epochs=50, batch_size=32, lr=5e-4
ViT: epochs=100, batch_size=16, lr=1e-4
Research Experiments
# Maximum capacity
ResNet: backbone=resnet101, embedding_dim=768
ViT: layers=8, heads=12, mining_strategy=hardest
π― Key Parameters to Experiment With
High Impact (Try First)
- Learning Rate: 1e-4 to 1e-2
- Batch Size: 16 to 128
- Triplet Margin: 0.1 to 0.5
- Epochs: 5 to 100
Medium Impact
- Embedding Dimension: 256, 512, 768, 1024
- Transformer Layers: 4, 6, 8, 12
- Optimizer: AdamW, Adam, SGD, RMSprop
Fine-tuning
- Weight Decay: 1e-6 to 1e-1
- Dropout: 0.0 to 0.5
- Attention Heads: 4, 8, 16
π Training Workflow
1. Start Simple π
- Use default parameters first
- Run quick training (5-10 epochs)
- Verify system works
2. Experiment Systematically π
- Change one parameter at a time
- Start with learning rate and batch size
- Document every change
3. Validate Results β
- Compare training curves
- Check validation metrics
- Ensure improvements are consistent
4. Scale Up π
- Use best parameters for longer training
- Increase epochs gradually
- Monitor for overfitting
π§ͺ Monitoring Training
What to Watch
- Training Loss: Should decrease steadily
- Validation Loss: Should decrease without overfitting
- Training Time: Per epoch timing
- GPU Memory: VRAM usage
Success Signs
- Smooth loss curves
- Consistent improvement
- Good generalization
Warning Signs
- Loss spikes or plateaus
- Validation loss increases
- Training becomes unstable
π§ Advanced Features
Mixed Precision Training
- Enable: Faster training, less memory
- Disable: More stable, higher precision
- Default: Enabled (recommended)
Triplet Mining Strategies
- Semi-hard: Balanced difficulty (default)
- Hardest: Maximum challenge
- Random: Simple but less effective
Data Augmentation
- Minimal: Basic transforms
- Standard: Balanced augmentation (default)
- Aggressive: Heavy augmentation
π Best Practices
1. Document Everything π
- Save parameter combinations
- Record training results
- Note hardware specifications
2. Start Small π¬
- Test with few epochs first
- Validate promising combinations
- Scale up gradually
3. Monitor Resources π»
- Watch GPU memory usage
- Check training time per epoch
- Balance quality vs. speed
4. Save Checkpoints πΎ
- Models are saved automatically
- Keep intermediate checkpoints
- Download final models
π¨ Common Issues & Solutions
Training Too Slow
- Reduce batch size
- Increase learning rate
- Use mixed precision
- Reduce embedding dimension
Training Unstable
- Reduce learning rate
- Increase batch size
- Enable gradient clipping
- Check data quality
Out of Memory
- Reduce batch size
- Reduce embedding dimension
- Use mixed precision
- Reduce transformer layers
Poor Results
- Increase epochs
- Adjust learning rate
- Try different optimizers
- Check data preprocessing
π Next Steps
1. Read the Full Guide
- See
TRAINING_PARAMETERS.mdfor detailed explanations - Understand parameter impact and trade-offs
2. Run Experiments
- Start with quick training
- Experiment with different parameters
- Document your findings
3. Optimize for Your Use Case
- Balance quality vs. speed
- Consider hardware constraints
- Aim for reproducible results
4. Share Results
- Document successful configurations
- Share insights with the community
- Contribute to best practices
π You're ready to start experimenting!
Remember: Start simple, change one thing at a time, and document everything. Happy training! π