Spaces:
Sleeping
Sleeping
| # π― Dressify Training Parameters Guide | |
| ## Overview | |
| The Dressify system provides **comprehensive parameter control** for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations. | |
| ## πΌοΈ ResNet Item Embedder Parameters | |
| ### Model Architecture | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Backbone Architecture** | `resnet50`, `resnet101` | `resnet50` | Base CNN architecture for feature extraction | | |
| | **Embedding Dimension** | 128-1024 | 512 | Output embedding vector size (must match ViT input) | | |
| | **Use ImageNet Pretrained** | `true`/`false` | `true` | Initialize with ImageNet weights | | |
| | **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in projection head for regularization | | |
| ### Training Parameters | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Epochs** | 1-100 | 20 | Total training iterations | | |
| | **Batch Size** | 8-128 | 64 | Images per training batch | | |
| | **Learning Rate** | 1e-5 to 1e-2 | 1e-3 | Step size for gradient descent | | |
| | **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm | | |
| | **Weight Decay** | 1e-6 to 1e-2 | 1e-4 | L2 regularization strength | | |
| | **Triplet Margin** | 0.1-1.0 | 0.2 | Distance margin for triplet loss | | |
| ## π§ ViT Outfit Encoder Parameters | |
| ### Model Architecture | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Embedding Dimension** | 128-1024 | 512 | Input embedding size (must match ResNet output) | | |
| | **Transformer Layers** | 2-12 | 6 | Number of transformer encoder layers | | |
| | **Attention Heads** | 4-16 | 8 | Number of multi-head attention heads | | |
| | **Feed-Forward Multiplier** | 2-8 | 4 | Hidden layer size multiplier | | |
| | **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in transformer layers | | |
| ### Training Parameters | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Epochs** | 1-100 | 30 | Total training iterations | | |
| | **Batch Size** | 4-64 | 32 | Outfits per training batch | | |
| | **Learning Rate** | 1e-5 to 1e-2 | 5e-4 | Step size for gradient descent | | |
| | **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm | | |
| | **Weight Decay** | 1e-4 to 1e-1 | 5e-2 | L2 regularization strength | | |
| | **Triplet Margin** | 0.1-1.0 | 0.3 | Distance margin for triplet loss | | |
| ## βοΈ Advanced Training Settings | |
| ### Hardware Optimization | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Mixed Precision (AMP)** | `true`/`false` | `true` | Use automatic mixed precision for faster training | | |
| | **Channels Last Memory** | `true`/`false` | `true` | Use channels_last format for CUDA optimization | | |
| | **Gradient Clipping** | 0.1-5.0 | 1.0 | Clip gradients to prevent explosion | | |
| ### Learning Rate Scheduling | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Warmup Epochs** | 0-10 | 3 | Gradual learning rate increase at start | | |
| | **Learning Rate Scheduler** | `cosine`, `step`, `plateau`, `linear` | `cosine` | LR decay strategy | | |
| | **Early Stopping Patience** | 5-20 | 10 | Stop training if no improvement | | |
| ### Training Strategy | |
| | Parameter | Range | Default | Description | | |
| |-----------|-------|---------|-------------| | |
| | **Triplet Mining Strategy** | `semi_hard`, `hardest`, `random` | `semi_hard` | Negative sample selection method | | |
| | **Data Augmentation Level** | `minimal`, `standard`, `aggressive` | `standard` | Image augmentation intensity | | |
| | **Random Seed** | 0-9999 | 42 | Reproducible training results | | |
| ## π¬ Parameter Impact Analysis | |
| ### High Impact Parameters (Experiment First) | |
| #### 1. **Learning Rate** π― | |
| - **Too High**: Training instability, loss spikes | |
| - **Too Low**: Slow convergence, stuck in local minima | |
| - **Sweet Spot**: 1e-3 for ResNet, 5e-4 for ViT | |
| - **Try**: 1e-4, 1e-3, 5e-3, 1e-2 | |
| #### 2. **Batch Size** π¦ | |
| - **Small**: Better generalization, slower training | |
| - **Large**: Faster training, potential overfitting | |
| - **Memory Constraint**: GPU VRAM limits maximum size | |
| - **Try**: 16, 32, 64, 128 | |
| #### 3. **Triplet Margin** π | |
| - **Small**: Easier triplets, faster convergence | |
| - **Large**: Harder triplets, better embeddings | |
| - **Balance**: 0.2-0.3 typically optimal | |
| - **Try**: 0.1, 0.2, 0.3, 0.5 | |
| ### Medium Impact Parameters | |
| #### 4. **Embedding Dimension** π’ | |
| - **Small**: Faster inference, less expressive | |
| - **Large**: More expressive, slower inference | |
| - **Trade-off**: 512 is good balance | |
| - **Try**: 256, 512, 768, 1024 | |
| #### 5. **Transformer Layers** ποΈ | |
| - **Few**: Faster training, less capacity | |
| - **Many**: More capacity, slower training | |
| - **Sweet Spot**: 4-8 layers | |
| - **Try**: 4, 6, 8, 12 | |
| #### 6. **Optimizer Choice** β‘ | |
| - **AdamW**: Best for most cases (default) | |
| - **Adam**: Good alternative | |
| - **SGD**: Better generalization, slower convergence | |
| - **RMSprop**: Alternative to Adam | |
| ### Low Impact Parameters (Fine-tune Last) | |
| #### 7. **Weight Decay** π‘οΈ | |
| - **Small**: Less regularization | |
| - **Large**: More regularization | |
| - **Default**: 1e-4 (ResNet), 5e-2 (ViT) | |
| #### 8. **Dropout Rate** π§ | |
| - **Small**: Less regularization | |
| - **Large**: More regularization | |
| - **Default**: 0.1 for both models | |
| #### 9. **Attention Heads** ποΈ | |
| - **Rule**: Should divide embedding dimension evenly | |
| - **Default**: 8 heads for 512 dimensions | |
| - **Try**: 4, 8, 16 | |
| ## π Recommended Parameter Combinations | |
| ### Quick Experimentation | |
| ```yaml | |
| # Fast Training (Low Quality) | |
| resnet_epochs: 5 | |
| vit_epochs: 10 | |
| batch_size: 16 | |
| learning_rate: 1e-3 | |
| ``` | |
| ### Balanced Training | |
| ```yaml | |
| # Standard Quality (Default) | |
| resnet_epochs: 20 | |
| vit_epochs: 30 | |
| batch_size: 64 | |
| learning_rate: 1e-3 | |
| triplet_margin: 0.2 | |
| ``` | |
| ### High Quality Training | |
| ```yaml | |
| # High Quality (Longer Training) | |
| resnet_epochs: 50 | |
| vit_epochs: 100 | |
| batch_size: 32 | |
| learning_rate: 5e-4 | |
| triplet_margin: 0.3 | |
| warmup_epochs: 5 | |
| ``` | |
| ### Research Experiments | |
| ```yaml | |
| # Research Configuration | |
| resnet_backbone: resnet101 | |
| embedding_dim: 768 | |
| transformer_layers: 8 | |
| attention_heads: 12 | |
| mining_strategy: hardest | |
| augmentation_level: aggressive | |
| ``` | |
| ## π Parameter Tuning Workflow | |
| ### 1. **Baseline Training** π | |
| ```bash | |
| # Start with default parameters | |
| ./scripts/train_item.sh | |
| ./scripts/train_outfit.sh | |
| ``` | |
| ### 2. **Learning Rate Sweep** π | |
| ```yaml | |
| # Test different learning rates | |
| learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2] | |
| epochs: 5 # Quick test | |
| ``` | |
| ### 3. **Architecture Search** ποΈ | |
| ```yaml | |
| # Test different model sizes | |
| embedding_dims: [256, 512, 768, 1024] | |
| transformer_layers: [4, 6, 8, 12] | |
| ``` | |
| ### 4. **Training Strategy** π― | |
| ```yaml | |
| # Test different strategies | |
| mining_strategies: [random, semi_hard, hardest] | |
| augmentation_levels: [minimal, standard, aggressive] | |
| ``` | |
| ### 5. **Hyperparameter Optimization** β‘ | |
| ```yaml | |
| # Fine-tune best combinations | |
| learning_rate: [4e-4, 5e-4, 6e-4] | |
| batch_size: [24, 32, 40] | |
| triplet_margin: [0.25, 0.3, 0.35] | |
| ``` | |
| ## π§ͺ Monitoring Training Progress | |
| ### Key Metrics to Watch | |
| 1. **Training Loss**: Should decrease steadily | |
| 2. **Validation Loss**: Should decrease without overfitting | |
| 3. **Triplet Accuracy**: Should increase over time | |
| 4. **Embedding Quality**: Check with t-SNE visualization | |
| ### Early Stopping Signs | |
| - Loss plateaus for 5+ epochs | |
| - Validation loss increases while training loss decreases | |
| - Triplet accuracy stops improving | |
| ### Success Indicators | |
| - Smooth loss curves | |
| - Consistent improvement in metrics | |
| - Good generalization (validation β training) | |
| ## π§ Advanced Parameter Combinations | |
| ### Memory-Constrained Training | |
| ```yaml | |
| # For limited GPU memory | |
| batch_size: 16 | |
| embedding_dim: 256 | |
| transformer_layers: 4 | |
| use_mixed_precision: true | |
| channels_last: true | |
| ``` | |
| ### High-Speed Training | |
| ```yaml | |
| # For quick iterations | |
| epochs: 10 | |
| batch_size: 128 | |
| learning_rate: 2e-3 | |
| warmup_epochs: 1 | |
| early_stopping_patience: 5 | |
| ``` | |
| ### Maximum Quality Training | |
| ```yaml | |
| # For production models | |
| epochs: 100 | |
| batch_size: 32 | |
| learning_rate: 1e-4 | |
| warmup_epochs: 10 | |
| early_stopping_patience: 20 | |
| mining_strategy: hardest | |
| augmentation_level: aggressive | |
| ``` | |
| ## π Parameter Logging | |
| ### Save Your Experiments | |
| ```python | |
| # Each training run saves: | |
| # - Custom config JSON | |
| # - Training metrics | |
| # - Model checkpoints | |
| # - Training logs | |
| ``` | |
| ### Track Changes | |
| ```yaml | |
| # Document parameter changes: | |
| experiment_001: | |
| changes: "Increased embedding_dim from 512 to 768" | |
| results: "Better triplet accuracy, slower training" | |
| next_steps: "Try reducing learning rate" | |
| experiment_002: | |
| changes: "Changed mining_strategy to hardest" | |
| results: "Harder training, better embeddings" | |
| next_steps: "Increase triplet_margin" | |
| ``` | |
| ## π― Pro Tips | |
| ### 1. **Start Simple** π | |
| - Begin with default parameters | |
| - Change one parameter at a time | |
| - Document every change | |
| ### 2. **Use Quick Training** β‘ | |
| - Test parameters with 1-5 epochs first | |
| - Validate promising combinations with full training | |
| - Save time on bad parameter combinations | |
| ### 3. **Monitor Resources** π» | |
| - Watch GPU memory usage | |
| - Monitor training time per epoch | |
| - Balance quality vs. speed | |
| ### 4. **Validate Changes** β | |
| - Always check validation metrics | |
| - Compare with baseline performance | |
| - Ensure improvements are consistent | |
| ### 5. **Save Everything** πΎ | |
| - Keep all experiment configs | |
| - Save intermediate checkpoints | |
| - Log training curves and metrics | |
| --- | |
| **Happy Parameter Tuning! π** | |
| *Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!* | |