Spaces:

Stylique
/

recomendation

Paused

File size: 9,614 Bytes

8bcf79a

# 🎯 Dressify Training Parameters Guide

## Overview

The Dressify system provides **comprehensive parameter control** for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.

## 🖼️ ResNet Item Embedder Parameters

### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Backbone Architecture** | `resnet50`, `resnet101` | `resnet50` | Base CNN architecture for feature extraction |
| **Embedding Dimension** | 128-1024 | 512 | Output embedding vector size (must match ViT input) |
| **Use ImageNet Pretrained** | `true`/`false` | `true` | Initialize with ImageNet weights |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in projection head for regularization |

### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 20 | Total training iterations |
| **Batch Size** | 8-128 | 64 | Images per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 1e-3 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-6 to 1e-2 | 1e-4 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.2 | Distance margin for triplet loss |

## 🧠 ViT Outfit Encoder Parameters

### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Embedding Dimension** | 128-1024 | 512 | Input embedding size (must match ResNet output) |
| **Transformer Layers** | 2-12 | 6 | Number of transformer encoder layers |
| **Attention Heads** | 4-16 | 8 | Number of multi-head attention heads |
| **Feed-Forward Multiplier** | 2-8 | 4 | Hidden layer size multiplier |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in transformer layers |

### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 30 | Total training iterations |
| **Batch Size** | 4-64 | 32 | Outfits per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 5e-4 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-4 to 1e-1 | 5e-2 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.3 | Distance margin for triplet loss |

## ⚙️ Advanced Training Settings

### Hardware Optimization
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Mixed Precision (AMP)** | `true`/`false` | `true` | Use automatic mixed precision for faster training |
| **Channels Last Memory** | `true`/`false` | `true` | Use channels_last format for CUDA optimization |
| **Gradient Clipping** | 0.1-5.0 | 1.0 | Clip gradients to prevent explosion |

### Learning Rate Scheduling
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Warmup Epochs** | 0-10 | 3 | Gradual learning rate increase at start |
| **Learning Rate Scheduler** | `cosine`, `step`, `plateau`, `linear` | `cosine` | LR decay strategy |
| **Early Stopping Patience** | 5-20 | 10 | Stop training if no improvement |

### Training Strategy
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Triplet Mining Strategy** | `semi_hard`, `hardest`, `random` | `semi_hard` | Negative sample selection method |
| **Data Augmentation Level** | `minimal`, `standard`, `aggressive` | `standard` | Image augmentation intensity |
| **Random Seed** | 0-9999 | 42 | Reproducible training results |

## 🔬 Parameter Impact Analysis

### High Impact Parameters (Experiment First)

#### 1. **Learning Rate** 🎯
- **Too High**: Training instability, loss spikes
- **Too Low**: Slow convergence, stuck in local minima
- **Sweet Spot**: 1e-3 for ResNet, 5e-4 for ViT
- **Try**: 1e-4, 1e-3, 5e-3, 1e-2

#### 2. **Batch Size** 📦
- **Small**: Better generalization, slower training
- **Large**: Faster training, potential overfitting
- **Memory Constraint**: GPU VRAM limits maximum size
- **Try**: 16, 32, 64, 128

#### 3. **Triplet Margin** 📏
- **Small**: Easier triplets, faster convergence
- **Large**: Harder triplets, better embeddings
- **Balance**: 0.2-0.3 typically optimal
- **Try**: 0.1, 0.2, 0.3, 0.5

### Medium Impact Parameters

#### 4. **Embedding Dimension** 🔢
- **Small**: Faster inference, less expressive
- **Large**: More expressive, slower inference
- **Trade-off**: 512 is good balance
- **Try**: 256, 512, 768, 1024

#### 5. **Transformer Layers** 🏗️
- **Few**: Faster training, less capacity
- **Many**: More capacity, slower training
- **Sweet Spot**: 4-8 layers
- **Try**: 4, 6, 8, 12

#### 6. **Optimizer Choice** ⚡
- **AdamW**: Best for most cases (default)
- **Adam**: Good alternative
- **SGD**: Better generalization, slower convergence
- **RMSprop**: Alternative to Adam

### Low Impact Parameters (Fine-tune Last)

#### 7. **Weight Decay** 🛡️
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 1e-4 (ResNet), 5e-2 (ViT)

#### 8. **Dropout Rate** 💧
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 0.1 for both models

#### 9. **Attention Heads** 👁️
- **Rule**: Should divide embedding dimension evenly
- **Default**: 8 heads for 512 dimensions
- **Try**: 4, 8, 16

## 🚀 Recommended Parameter Combinations

### Quick Experimentation
```yaml
# Fast Training (Low Quality)
resnet_epochs: 5
vit_epochs: 10
batch_size: 16
learning_rate: 1e-3
```

### Balanced Training
```yaml
# Standard Quality (Default)
resnet_epochs: 20
vit_epochs: 30
batch_size: 64
learning_rate: 1e-3
triplet_margin: 0.2
```

### High Quality Training
```yaml
# High Quality (Longer Training)
resnet_epochs: 50
vit_epochs: 100
batch_size: 32
learning_rate: 5e-4
triplet_margin: 0.3
warmup_epochs: 5
```

### Research Experiments
```yaml
# Research Configuration
resnet_backbone: resnet101
embedding_dim: 768
transformer_layers: 8
attention_heads: 12
mining_strategy: hardest
augmentation_level: aggressive
```

## 📊 Parameter Tuning Workflow

### 1. **Baseline Training** 📈
```bash
# Start with default parameters
./scripts/train_item.sh
./scripts/train_outfit.sh
```

### 2. **Learning Rate Sweep** 🔍
```yaml
# Test different learning rates
learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
epochs: 5  # Quick test
```

### 3. **Architecture Search** 🏗️
```yaml
# Test different model sizes
embedding_dims: [256, 512, 768, 1024]
transformer_layers: [4, 6, 8, 12]
```

### 4. **Training Strategy** 🎯
```yaml
# Test different strategies
mining_strategies: [random, semi_hard, hardest]
augmentation_levels: [minimal, standard, aggressive]
```

### 5. **Hyperparameter Optimization** ⚡
```yaml
# Fine-tune best combinations
learning_rate: [4e-4, 5e-4, 6e-4]
batch_size: [24, 32, 40]
triplet_margin: [0.25, 0.3, 0.35]
```

## 🧪 Monitoring Training Progress

### Key Metrics to Watch
1. **Training Loss**: Should decrease steadily
2. **Validation Loss**: Should decrease without overfitting
3. **Triplet Accuracy**: Should increase over time
4. **Embedding Quality**: Check with t-SNE visualization

### Early Stopping Signs
- Loss plateaus for 5+ epochs
- Validation loss increases while training loss decreases
- Triplet accuracy stops improving

### Success Indicators
- Smooth loss curves
- Consistent improvement in metrics
- Good generalization (validation ≈ training)

## 🔧 Advanced Parameter Combinations

### Memory-Constrained Training
```yaml
# For limited GPU memory
batch_size: 16
embedding_dim: 256
transformer_layers: 4
use_mixed_precision: true
channels_last: true
```

### High-Speed Training
```yaml
# For quick iterations
epochs: 10
batch_size: 128
learning_rate: 2e-3
warmup_epochs: 1
early_stopping_patience: 5
```

### Maximum Quality Training
```yaml
# For production models
epochs: 100
batch_size: 32
learning_rate: 1e-4
warmup_epochs: 10
early_stopping_patience: 20
mining_strategy: hardest
augmentation_level: aggressive
```

## 📝 Parameter Logging

### Save Your Experiments
```python
# Each training run saves:
# - Custom config JSON
# - Training metrics
# - Model checkpoints
# - Training logs
```

### Track Changes
```yaml
# Document parameter changes:
experiment_001:
  changes: "Increased embedding_dim from 512 to 768"
  results: "Better triplet accuracy, slower training"
  next_steps: "Try reducing learning rate"

experiment_002:
  changes: "Changed mining_strategy to hardest"
  results: "Harder training, better embeddings"
  next_steps: "Increase triplet_margin"
```

## 🎯 Pro Tips

### 1. **Start Simple** 🚀
- Begin with default parameters
- Change one parameter at a time
- Document every change

### 2. **Use Quick Training** ⚡
- Test parameters with 1-5 epochs first
- Validate promising combinations with full training
- Save time on bad parameter combinations

### 3. **Monitor Resources** 💻
- Watch GPU memory usage
- Monitor training time per epoch
- Balance quality vs. speed

### 4. **Validate Changes** ✅
- Always check validation metrics
- Compare with baseline performance
- Ensure improvements are consistent

### 5. **Save Everything** 💾
- Keep all experiment configs
- Save intermediate checkpoints
- Log training curves and metrics

---

**Happy Parameter Tuning! 🎉**

*Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!*