recomendation / TRAINING_PARAMETERS.md
Ali Mohsin
more try
8bcf79a

A newer version of the Gradio SDK is available: 5.49.1

Upgrade

🎯 Dressify Training Parameters Guide

Overview

The Dressify system provides comprehensive parameter control for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.

πŸ–ΌοΈ ResNet Item Embedder Parameters

Model Architecture

Parameter Range Default Description
Backbone Architecture resnet50, resnet101 resnet50 Base CNN architecture for feature extraction
Embedding Dimension 128-1024 512 Output embedding vector size (must match ViT input)
Use ImageNet Pretrained true/false true Initialize with ImageNet weights
Dropout Rate 0.0-0.5 0.1 Dropout in projection head for regularization

Training Parameters

Parameter Range Default Description
Epochs 1-100 20 Total training iterations
Batch Size 8-128 64 Images per training batch
Learning Rate 1e-5 to 1e-2 1e-3 Step size for gradient descent
Optimizer adamw, adam, sgd, rmsprop adamw Optimization algorithm
Weight Decay 1e-6 to 1e-2 1e-4 L2 regularization strength
Triplet Margin 0.1-1.0 0.2 Distance margin for triplet loss

🧠 ViT Outfit Encoder Parameters

Model Architecture

Parameter Range Default Description
Embedding Dimension 128-1024 512 Input embedding size (must match ResNet output)
Transformer Layers 2-12 6 Number of transformer encoder layers
Attention Heads 4-16 8 Number of multi-head attention heads
Feed-Forward Multiplier 2-8 4 Hidden layer size multiplier
Dropout Rate 0.0-0.5 0.1 Dropout in transformer layers

Training Parameters

Parameter Range Default Description
Epochs 1-100 30 Total training iterations
Batch Size 4-64 32 Outfits per training batch
Learning Rate 1e-5 to 1e-2 5e-4 Step size for gradient descent
Optimizer adamw, adam, sgd, rmsprop adamw Optimization algorithm
Weight Decay 1e-4 to 1e-1 5e-2 L2 regularization strength
Triplet Margin 0.1-1.0 0.3 Distance margin for triplet loss

βš™οΈ Advanced Training Settings

Hardware Optimization

Parameter Range Default Description
Mixed Precision (AMP) true/false true Use automatic mixed precision for faster training
Channels Last Memory true/false true Use channels_last format for CUDA optimization
Gradient Clipping 0.1-5.0 1.0 Clip gradients to prevent explosion

Learning Rate Scheduling

Parameter Range Default Description
Warmup Epochs 0-10 3 Gradual learning rate increase at start
Learning Rate Scheduler cosine, step, plateau, linear cosine LR decay strategy
Early Stopping Patience 5-20 10 Stop training if no improvement

Training Strategy

Parameter Range Default Description
Triplet Mining Strategy semi_hard, hardest, random semi_hard Negative sample selection method
Data Augmentation Level minimal, standard, aggressive standard Image augmentation intensity
Random Seed 0-9999 42 Reproducible training results

πŸ”¬ Parameter Impact Analysis

High Impact Parameters (Experiment First)

1. Learning Rate 🎯

  • Too High: Training instability, loss spikes
  • Too Low: Slow convergence, stuck in local minima
  • Sweet Spot: 1e-3 for ResNet, 5e-4 for ViT
  • Try: 1e-4, 1e-3, 5e-3, 1e-2

2. Batch Size πŸ“¦

  • Small: Better generalization, slower training
  • Large: Faster training, potential overfitting
  • Memory Constraint: GPU VRAM limits maximum size
  • Try: 16, 32, 64, 128

3. Triplet Margin πŸ“

  • Small: Easier triplets, faster convergence
  • Large: Harder triplets, better embeddings
  • Balance: 0.2-0.3 typically optimal
  • Try: 0.1, 0.2, 0.3, 0.5

Medium Impact Parameters

4. Embedding Dimension πŸ”’

  • Small: Faster inference, less expressive
  • Large: More expressive, slower inference
  • Trade-off: 512 is good balance
  • Try: 256, 512, 768, 1024

5. Transformer Layers πŸ—οΈ

  • Few: Faster training, less capacity
  • Many: More capacity, slower training
  • Sweet Spot: 4-8 layers
  • Try: 4, 6, 8, 12

6. Optimizer Choice ⚑

  • AdamW: Best for most cases (default)
  • Adam: Good alternative
  • SGD: Better generalization, slower convergence
  • RMSprop: Alternative to Adam

Low Impact Parameters (Fine-tune Last)

7. Weight Decay πŸ›‘οΈ

  • Small: Less regularization
  • Large: More regularization
  • Default: 1e-4 (ResNet), 5e-2 (ViT)

8. Dropout Rate πŸ’§

  • Small: Less regularization
  • Large: More regularization
  • Default: 0.1 for both models

9. Attention Heads πŸ‘οΈ

  • Rule: Should divide embedding dimension evenly
  • Default: 8 heads for 512 dimensions
  • Try: 4, 8, 16

πŸš€ Recommended Parameter Combinations

Quick Experimentation

# Fast Training (Low Quality)
resnet_epochs: 5
vit_epochs: 10
batch_size: 16
learning_rate: 1e-3

Balanced Training

# Standard Quality (Default)
resnet_epochs: 20
vit_epochs: 30
batch_size: 64
learning_rate: 1e-3
triplet_margin: 0.2

High Quality Training

# High Quality (Longer Training)
resnet_epochs: 50
vit_epochs: 100
batch_size: 32
learning_rate: 5e-4
triplet_margin: 0.3
warmup_epochs: 5

Research Experiments

# Research Configuration
resnet_backbone: resnet101
embedding_dim: 768
transformer_layers: 8
attention_heads: 12
mining_strategy: hardest
augmentation_level: aggressive

πŸ“Š Parameter Tuning Workflow

1. Baseline Training πŸ“ˆ

# Start with default parameters
./scripts/train_item.sh
./scripts/train_outfit.sh

2. Learning Rate Sweep πŸ”

# Test different learning rates
learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
epochs: 5  # Quick test

3. Architecture Search πŸ—οΈ

# Test different model sizes
embedding_dims: [256, 512, 768, 1024]
transformer_layers: [4, 6, 8, 12]

4. Training Strategy 🎯

# Test different strategies
mining_strategies: [random, semi_hard, hardest]
augmentation_levels: [minimal, standard, aggressive]

5. Hyperparameter Optimization ⚑

# Fine-tune best combinations
learning_rate: [4e-4, 5e-4, 6e-4]
batch_size: [24, 32, 40]
triplet_margin: [0.25, 0.3, 0.35]

πŸ§ͺ Monitoring Training Progress

Key Metrics to Watch

  1. Training Loss: Should decrease steadily
  2. Validation Loss: Should decrease without overfitting
  3. Triplet Accuracy: Should increase over time
  4. Embedding Quality: Check with t-SNE visualization

Early Stopping Signs

  • Loss plateaus for 5+ epochs
  • Validation loss increases while training loss decreases
  • Triplet accuracy stops improving

Success Indicators

  • Smooth loss curves
  • Consistent improvement in metrics
  • Good generalization (validation β‰ˆ training)

πŸ”§ Advanced Parameter Combinations

Memory-Constrained Training

# For limited GPU memory
batch_size: 16
embedding_dim: 256
transformer_layers: 4
use_mixed_precision: true
channels_last: true

High-Speed Training

# For quick iterations
epochs: 10
batch_size: 128
learning_rate: 2e-3
warmup_epochs: 1
early_stopping_patience: 5

Maximum Quality Training

# For production models
epochs: 100
batch_size: 32
learning_rate: 1e-4
warmup_epochs: 10
early_stopping_patience: 20
mining_strategy: hardest
augmentation_level: aggressive

πŸ“ Parameter Logging

Save Your Experiments

# Each training run saves:
# - Custom config JSON
# - Training metrics
# - Model checkpoints
# - Training logs

Track Changes

# Document parameter changes:
experiment_001:
  changes: "Increased embedding_dim from 512 to 768"
  results: "Better triplet accuracy, slower training"
  next_steps: "Try reducing learning rate"

experiment_002:
  changes: "Changed mining_strategy to hardest"
  results: "Harder training, better embeddings"
  next_steps: "Increase triplet_margin"

🎯 Pro Tips

1. Start Simple πŸš€

  • Begin with default parameters
  • Change one parameter at a time
  • Document every change

2. Use Quick Training ⚑

  • Test parameters with 1-5 epochs first
  • Validate promising combinations with full training
  • Save time on bad parameter combinations

3. Monitor Resources πŸ’»

  • Watch GPU memory usage
  • Monitor training time per epoch
  • Balance quality vs. speed

4. Validate Changes βœ…

  • Always check validation metrics
  • Compare with baseline performance
  • Ensure improvements are consistent

5. Save Everything πŸ’Ύ

  • Keep all experiment configs
  • Save intermediate checkpoints
  • Log training curves and metrics

Happy Parameter Tuning! πŸŽ‰

Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!