Spaces:

Stylique
/

recomendation

Paused

App Files Files Community

recomendation / QUICK_START_TRAINING.md

Ali Mohsin

more try

8bcf79a 3 months ago

preview code

raw

history blame

6 kB

🚀 Quick Start: Advanced Training Interface

Overview

The Dressify system now provides comprehensive parameter control for both ResNet and ViT training directly from the Gradio interface. You can tweak every aspect of model training without editing code!

🎯 What You Can Control

ResNet Item Embedder

Architecture: Backbone (ResNet50/101), embedding dimension, dropout
Training: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
Hardware: Mixed precision, memory format, gradient clipping

ViT Outfit Encoder

Architecture: Transformer layers, attention heads, feed-forward multiplier, dropout
Training: Epochs, batch size, learning rate, optimizer, weight decay, triplet margin
Strategy: Mining strategy, augmentation level, random seed

Advanced Settings

Learning Rate: Warmup epochs, scheduler type, early stopping patience
Optimization: Mixed precision, channels-last memory, gradient clipping
Reproducibility: Random seed, deterministic training

🚀 Quick Start Steps

1. Launch the App

python app.py

2. Go to Advanced Training Tab

Click on the "🔬 Advanced Training" tab
You'll see comprehensive parameter controls organized in sections

3. Choose Your Training Mode

Quick Training (Basic)

Set ResNet epochs: 5-10
Set ViT epochs: 10-20
Click "🚀 Start Quick Training"

Advanced Training (Custom)

Adjust all parameters to your liking
Click "🎯 Start Advanced Training"

4. Monitor Progress

Watch the training log for real-time updates
Check the Status tab for system health
Download models from the Downloads tab when complete

🔬 Parameter Tuning Examples

Fast Experimentation

# Quick test (5-10 minutes)
ResNet: epochs=5, batch_size=16, lr=1e-3
ViT: epochs=10, batch_size=16, lr=5e-4

Standard Training

# Balanced quality (1-2 hours)
ResNet: epochs=20, batch_size=64, lr=1e-3
ViT: epochs=30, batch_size=32, lr=5e-4

High Quality Training

# Production models (4-6 hours)
ResNet: epochs=50, batch_size=32, lr=5e-4
ViT: epochs=100, batch_size=16, lr=1e-4

Research Experiments

# Maximum capacity
ResNet: backbone=resnet101, embedding_dim=768
ViT: layers=8, heads=12, mining_strategy=hardest

🎯 Key Parameters to Experiment With

High Impact (Try First)

Learning Rate: 1e-4 to 1e-2
Batch Size: 16 to 128
Triplet Margin: 0.1 to 0.5
Epochs: 5 to 100

Medium Impact

Embedding Dimension: 256, 512, 768, 1024
Transformer Layers: 4, 6, 8, 12
Optimizer: AdamW, Adam, SGD, RMSprop

Fine-tuning

Weight Decay: 1e-6 to 1e-1
Dropout: 0.0 to 0.5
Attention Heads: 4, 8, 16

📊 Training Workflow

1. Start Simple 🚀

Use default parameters first
Run quick training (5-10 epochs)
Verify system works

2. Experiment Systematically 🔍

Change one parameter at a time
Start with learning rate and batch size
Document every change

3. Validate Results ✅

Compare training curves
Check validation metrics
Ensure improvements are consistent

4. Scale Up 📈

Use best parameters for longer training
Increase epochs gradually
Monitor for overfitting

🧪 Monitoring Training

What to Watch

Training Loss: Should decrease steadily
Validation Loss: Should decrease without overfitting
Training Time: Per epoch timing
GPU Memory: VRAM usage

Success Signs

Smooth loss curves
Consistent improvement
Good generalization

Warning Signs

Loss spikes or plateaus
Validation loss increases
Training becomes unstable

🔧 Advanced Features

Mixed Precision Training

Enable: Faster training, less memory
Disable: More stable, higher precision
Default: Enabled (recommended)

Triplet Mining Strategies

Semi-hard: Balanced difficulty (default)
Hardest: Maximum challenge
Random: Simple but less effective

Data Augmentation

Minimal: Basic transforms
Standard: Balanced augmentation (default)
Aggressive: Heavy augmentation

📝 Best Practices

1. Document Everything 📚

Save parameter combinations
Record training results
Note hardware specifications

2. Start Small 🔬

Test with few epochs first
Validate promising combinations
Scale up gradually

3. Monitor Resources 💻

Watch GPU memory usage
Check training time per epoch
Balance quality vs. speed

4. Save Checkpoints 💾

Models are saved automatically
Keep intermediate checkpoints
Download final models

🚨 Common Issues & Solutions

Training Too Slow

Reduce batch size
Increase learning rate
Use mixed precision
Reduce embedding dimension

Training Unstable

Reduce learning rate
Increase batch size
Enable gradient clipping
Check data quality

Out of Memory

Reduce batch size
Reduce embedding dimension
Use mixed precision
Reduce transformer layers

Poor Results

Increase epochs
Adjust learning rate
Try different optimizers
Check data preprocessing

📚 Next Steps

1. Read the Full Guide

See TRAINING_PARAMETERS.md for detailed explanations
Understand parameter impact and trade-offs

2. Run Experiments

Start with quick training
Experiment with different parameters
Document your findings

3. Optimize for Your Use Case

Balance quality vs. speed
Consider hardware constraints
Aim for reproducible results

4. Share Results

Document successful configurations
Share insights with the community
Contribute to best practices

🎉 You're ready to start experimenting!

Remember: Start simple, change one thing at a time, and document everything. Happy training! 🚀