π― Dressify Training Parameters Guide
Overview
The Dressify system provides comprehensive parameter control for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.
πΌοΈ ResNet Item Embedder Parameters
Model Architecture
| Parameter |
Range |
Default |
Description |
| Backbone Architecture |
resnet50, resnet101 |
resnet50 |
Base CNN architecture for feature extraction |
| Embedding Dimension |
128-1024 |
512 |
Output embedding vector size (must match ViT input) |
| Use ImageNet Pretrained |
true/false |
true |
Initialize with ImageNet weights |
| Dropout Rate |
0.0-0.5 |
0.1 |
Dropout in projection head for regularization |
Training Parameters
| Parameter |
Range |
Default |
Description |
| Epochs |
1-100 |
20 |
Total training iterations |
| Batch Size |
8-128 |
64 |
Images per training batch |
| Learning Rate |
1e-5 to 1e-2 |
1e-3 |
Step size for gradient descent |
| Optimizer |
adamw, adam, sgd, rmsprop |
adamw |
Optimization algorithm |
| Weight Decay |
1e-6 to 1e-2 |
1e-4 |
L2 regularization strength |
| Triplet Margin |
0.1-1.0 |
0.2 |
Distance margin for triplet loss |
π§ ViT Outfit Encoder Parameters
Model Architecture
| Parameter |
Range |
Default |
Description |
| Embedding Dimension |
128-1024 |
512 |
Input embedding size (must match ResNet output) |
| Transformer Layers |
2-12 |
6 |
Number of transformer encoder layers |
| Attention Heads |
4-16 |
8 |
Number of multi-head attention heads |
| Feed-Forward Multiplier |
2-8 |
4 |
Hidden layer size multiplier |
| Dropout Rate |
0.0-0.5 |
0.1 |
Dropout in transformer layers |
Training Parameters
| Parameter |
Range |
Default |
Description |
| Epochs |
1-100 |
30 |
Total training iterations |
| Batch Size |
4-64 |
32 |
Outfits per training batch |
| Learning Rate |
1e-5 to 1e-2 |
5e-4 |
Step size for gradient descent |
| Optimizer |
adamw, adam, sgd, rmsprop |
adamw |
Optimization algorithm |
| Weight Decay |
1e-4 to 1e-1 |
5e-2 |
L2 regularization strength |
| Triplet Margin |
0.1-1.0 |
0.3 |
Distance margin for triplet loss |
βοΈ Advanced Training Settings
Hardware Optimization
| Parameter |
Range |
Default |
Description |
| Mixed Precision (AMP) |
true/false |
true |
Use automatic mixed precision for faster training |
| Channels Last Memory |
true/false |
true |
Use channels_last format for CUDA optimization |
| Gradient Clipping |
0.1-5.0 |
1.0 |
Clip gradients to prevent explosion |
Learning Rate Scheduling
| Parameter |
Range |
Default |
Description |
| Warmup Epochs |
0-10 |
3 |
Gradual learning rate increase at start |
| Learning Rate Scheduler |
cosine, step, plateau, linear |
cosine |
LR decay strategy |
| Early Stopping Patience |
5-20 |
10 |
Stop training if no improvement |
Training Strategy
| Parameter |
Range |
Default |
Description |
| Triplet Mining Strategy |
semi_hard, hardest, random |
semi_hard |
Negative sample selection method |
| Data Augmentation Level |
minimal, standard, aggressive |
standard |
Image augmentation intensity |
| Random Seed |
0-9999 |
42 |
Reproducible training results |
π¬ Parameter Impact Analysis
High Impact Parameters (Experiment First)
1. Learning Rate π―
- Too High: Training instability, loss spikes
- Too Low: Slow convergence, stuck in local minima
- Sweet Spot: 1e-3 for ResNet, 5e-4 for ViT
- Try: 1e-4, 1e-3, 5e-3, 1e-2
2. Batch Size π¦
- Small: Better generalization, slower training
- Large: Faster training, potential overfitting
- Memory Constraint: GPU VRAM limits maximum size
- Try: 16, 32, 64, 128
3. Triplet Margin π
- Small: Easier triplets, faster convergence
- Large: Harder triplets, better embeddings
- Balance: 0.2-0.3 typically optimal
- Try: 0.1, 0.2, 0.3, 0.5
Medium Impact Parameters
4. Embedding Dimension π’
- Small: Faster inference, less expressive
- Large: More expressive, slower inference
- Trade-off: 512 is good balance
- Try: 256, 512, 768, 1024
5. Transformer Layers ποΈ
- Few: Faster training, less capacity
- Many: More capacity, slower training
- Sweet Spot: 4-8 layers
- Try: 4, 6, 8, 12
6. Optimizer Choice β‘
- AdamW: Best for most cases (default)
- Adam: Good alternative
- SGD: Better generalization, slower convergence
- RMSprop: Alternative to Adam
Low Impact Parameters (Fine-tune Last)
7. Weight Decay π‘οΈ
- Small: Less regularization
- Large: More regularization
- Default: 1e-4 (ResNet), 5e-2 (ViT)
8. Dropout Rate π§
- Small: Less regularization
- Large: More regularization
- Default: 0.1 for both models
9. Attention Heads ποΈ
- Rule: Should divide embedding dimension evenly
- Default: 8 heads for 512 dimensions
- Try: 4, 8, 16
π Recommended Parameter Combinations
Quick Experimentation
resnet_epochs: 5
vit_epochs: 10
batch_size: 16
learning_rate: 1e-3
Balanced Training
resnet_epochs: 20
vit_epochs: 30
batch_size: 64
learning_rate: 1e-3
triplet_margin: 0.2
High Quality Training
resnet_epochs: 50
vit_epochs: 100
batch_size: 32
learning_rate: 5e-4
triplet_margin: 0.3
warmup_epochs: 5
Research Experiments
resnet_backbone: resnet101
embedding_dim: 768
transformer_layers: 8
attention_heads: 12
mining_strategy: hardest
augmentation_level: aggressive
π Parameter Tuning Workflow
1. Baseline Training π
./scripts/train_item.sh
./scripts/train_outfit.sh
2. Learning Rate Sweep π
learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
epochs: 5
3. Architecture Search ποΈ
embedding_dims: [256, 512, 768, 1024]
transformer_layers: [4, 6, 8, 12]
4. Training Strategy π―
mining_strategies: [random, semi_hard, hardest]
augmentation_levels: [minimal, standard, aggressive]
5. Hyperparameter Optimization β‘
learning_rate: [4e-4, 5e-4, 6e-4]
batch_size: [24, 32, 40]
triplet_margin: [0.25, 0.3, 0.35]
π§ͺ Monitoring Training Progress
Key Metrics to Watch
- Training Loss: Should decrease steadily
- Validation Loss: Should decrease without overfitting
- Triplet Accuracy: Should increase over time
- Embedding Quality: Check with t-SNE visualization
Early Stopping Signs
- Loss plateaus for 5+ epochs
- Validation loss increases while training loss decreases
- Triplet accuracy stops improving
Success Indicators
- Smooth loss curves
- Consistent improvement in metrics
- Good generalization (validation β training)
π§ Advanced Parameter Combinations
Memory-Constrained Training
batch_size: 16
embedding_dim: 256
transformer_layers: 4
use_mixed_precision: true
channels_last: true
High-Speed Training
epochs: 10
batch_size: 128
learning_rate: 2e-3
warmup_epochs: 1
early_stopping_patience: 5
Maximum Quality Training
epochs: 100
batch_size: 32
learning_rate: 1e-4
warmup_epochs: 10
early_stopping_patience: 20
mining_strategy: hardest
augmentation_level: aggressive
π Parameter Logging
Save Your Experiments
Track Changes
experiment_001:
changes: "Increased embedding_dim from 512 to 768"
results: "Better triplet accuracy, slower training"
next_steps: "Try reducing learning rate"
experiment_002:
changes: "Changed mining_strategy to hardest"
results: "Harder training, better embeddings"
next_steps: "Increase triplet_margin"
π― Pro Tips
1. Start Simple π
- Begin with default parameters
- Change one parameter at a time
- Document every change
2. Use Quick Training β‘
- Test parameters with 1-5 epochs first
- Validate promising combinations with full training
- Save time on bad parameter combinations
3. Monitor Resources π»
- Watch GPU memory usage
- Monitor training time per epoch
- Balance quality vs. speed
4. Validate Changes β
- Always check validation metrics
- Compare with baseline performance
- Ensure improvements are consistent
5. Save Everything πΎ
- Keep all experiment configs
- Save intermediate checkpoints
- Log training curves and metrics
Happy Parameter Tuning! π
Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!