Spaces:
Paused
Paused
File size: 9,614 Bytes
8bcf79a |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 |
# π― Dressify Training Parameters Guide
## Overview
The Dressify system provides **comprehensive parameter control** for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.
## πΌοΈ ResNet Item Embedder Parameters
### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Backbone Architecture** | `resnet50`, `resnet101` | `resnet50` | Base CNN architecture for feature extraction |
| **Embedding Dimension** | 128-1024 | 512 | Output embedding vector size (must match ViT input) |
| **Use ImageNet Pretrained** | `true`/`false` | `true` | Initialize with ImageNet weights |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in projection head for regularization |
### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 20 | Total training iterations |
| **Batch Size** | 8-128 | 64 | Images per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 1e-3 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-6 to 1e-2 | 1e-4 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.2 | Distance margin for triplet loss |
## π§ ViT Outfit Encoder Parameters
### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Embedding Dimension** | 128-1024 | 512 | Input embedding size (must match ResNet output) |
| **Transformer Layers** | 2-12 | 6 | Number of transformer encoder layers |
| **Attention Heads** | 4-16 | 8 | Number of multi-head attention heads |
| **Feed-Forward Multiplier** | 2-8 | 4 | Hidden layer size multiplier |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in transformer layers |
### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 30 | Total training iterations |
| **Batch Size** | 4-64 | 32 | Outfits per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 5e-4 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-4 to 1e-1 | 5e-2 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.3 | Distance margin for triplet loss |
## βοΈ Advanced Training Settings
### Hardware Optimization
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Mixed Precision (AMP)** | `true`/`false` | `true` | Use automatic mixed precision for faster training |
| **Channels Last Memory** | `true`/`false` | `true` | Use channels_last format for CUDA optimization |
| **Gradient Clipping** | 0.1-5.0 | 1.0 | Clip gradients to prevent explosion |
### Learning Rate Scheduling
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Warmup Epochs** | 0-10 | 3 | Gradual learning rate increase at start |
| **Learning Rate Scheduler** | `cosine`, `step`, `plateau`, `linear` | `cosine` | LR decay strategy |
| **Early Stopping Patience** | 5-20 | 10 | Stop training if no improvement |
### Training Strategy
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Triplet Mining Strategy** | `semi_hard`, `hardest`, `random` | `semi_hard` | Negative sample selection method |
| **Data Augmentation Level** | `minimal`, `standard`, `aggressive` | `standard` | Image augmentation intensity |
| **Random Seed** | 0-9999 | 42 | Reproducible training results |
## π¬ Parameter Impact Analysis
### High Impact Parameters (Experiment First)
#### 1. **Learning Rate** π―
- **Too High**: Training instability, loss spikes
- **Too Low**: Slow convergence, stuck in local minima
- **Sweet Spot**: 1e-3 for ResNet, 5e-4 for ViT
- **Try**: 1e-4, 1e-3, 5e-3, 1e-2
#### 2. **Batch Size** π¦
- **Small**: Better generalization, slower training
- **Large**: Faster training, potential overfitting
- **Memory Constraint**: GPU VRAM limits maximum size
- **Try**: 16, 32, 64, 128
#### 3. **Triplet Margin** π
- **Small**: Easier triplets, faster convergence
- **Large**: Harder triplets, better embeddings
- **Balance**: 0.2-0.3 typically optimal
- **Try**: 0.1, 0.2, 0.3, 0.5
### Medium Impact Parameters
#### 4. **Embedding Dimension** π’
- **Small**: Faster inference, less expressive
- **Large**: More expressive, slower inference
- **Trade-off**: 512 is good balance
- **Try**: 256, 512, 768, 1024
#### 5. **Transformer Layers** ποΈ
- **Few**: Faster training, less capacity
- **Many**: More capacity, slower training
- **Sweet Spot**: 4-8 layers
- **Try**: 4, 6, 8, 12
#### 6. **Optimizer Choice** β‘
- **AdamW**: Best for most cases (default)
- **Adam**: Good alternative
- **SGD**: Better generalization, slower convergence
- **RMSprop**: Alternative to Adam
### Low Impact Parameters (Fine-tune Last)
#### 7. **Weight Decay** π‘οΈ
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 1e-4 (ResNet), 5e-2 (ViT)
#### 8. **Dropout Rate** π§
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 0.1 for both models
#### 9. **Attention Heads** ποΈ
- **Rule**: Should divide embedding dimension evenly
- **Default**: 8 heads for 512 dimensions
- **Try**: 4, 8, 16
## π Recommended Parameter Combinations
### Quick Experimentation
```yaml
# Fast Training (Low Quality)
resnet_epochs: 5
vit_epochs: 10
batch_size: 16
learning_rate: 1e-3
```
### Balanced Training
```yaml
# Standard Quality (Default)
resnet_epochs: 20
vit_epochs: 30
batch_size: 64
learning_rate: 1e-3
triplet_margin: 0.2
```
### High Quality Training
```yaml
# High Quality (Longer Training)
resnet_epochs: 50
vit_epochs: 100
batch_size: 32
learning_rate: 5e-4
triplet_margin: 0.3
warmup_epochs: 5
```
### Research Experiments
```yaml
# Research Configuration
resnet_backbone: resnet101
embedding_dim: 768
transformer_layers: 8
attention_heads: 12
mining_strategy: hardest
augmentation_level: aggressive
```
## π Parameter Tuning Workflow
### 1. **Baseline Training** π
```bash
# Start with default parameters
./scripts/train_item.sh
./scripts/train_outfit.sh
```
### 2. **Learning Rate Sweep** π
```yaml
# Test different learning rates
learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
epochs: 5 # Quick test
```
### 3. **Architecture Search** ποΈ
```yaml
# Test different model sizes
embedding_dims: [256, 512, 768, 1024]
transformer_layers: [4, 6, 8, 12]
```
### 4. **Training Strategy** π―
```yaml
# Test different strategies
mining_strategies: [random, semi_hard, hardest]
augmentation_levels: [minimal, standard, aggressive]
```
### 5. **Hyperparameter Optimization** β‘
```yaml
# Fine-tune best combinations
learning_rate: [4e-4, 5e-4, 6e-4]
batch_size: [24, 32, 40]
triplet_margin: [0.25, 0.3, 0.35]
```
## π§ͺ Monitoring Training Progress
### Key Metrics to Watch
1. **Training Loss**: Should decrease steadily
2. **Validation Loss**: Should decrease without overfitting
3. **Triplet Accuracy**: Should increase over time
4. **Embedding Quality**: Check with t-SNE visualization
### Early Stopping Signs
- Loss plateaus for 5+ epochs
- Validation loss increases while training loss decreases
- Triplet accuracy stops improving
### Success Indicators
- Smooth loss curves
- Consistent improvement in metrics
- Good generalization (validation β training)
## π§ Advanced Parameter Combinations
### Memory-Constrained Training
```yaml
# For limited GPU memory
batch_size: 16
embedding_dim: 256
transformer_layers: 4
use_mixed_precision: true
channels_last: true
```
### High-Speed Training
```yaml
# For quick iterations
epochs: 10
batch_size: 128
learning_rate: 2e-3
warmup_epochs: 1
early_stopping_patience: 5
```
### Maximum Quality Training
```yaml
# For production models
epochs: 100
batch_size: 32
learning_rate: 1e-4
warmup_epochs: 10
early_stopping_patience: 20
mining_strategy: hardest
augmentation_level: aggressive
```
## π Parameter Logging
### Save Your Experiments
```python
# Each training run saves:
# - Custom config JSON
# - Training metrics
# - Model checkpoints
# - Training logs
```
### Track Changes
```yaml
# Document parameter changes:
experiment_001:
changes: "Increased embedding_dim from 512 to 768"
results: "Better triplet accuracy, slower training"
next_steps: "Try reducing learning rate"
experiment_002:
changes: "Changed mining_strategy to hardest"
results: "Harder training, better embeddings"
next_steps: "Increase triplet_margin"
```
## π― Pro Tips
### 1. **Start Simple** π
- Begin with default parameters
- Change one parameter at a time
- Document every change
### 2. **Use Quick Training** β‘
- Test parameters with 1-5 epochs first
- Validate promising combinations with full training
- Save time on bad parameter combinations
### 3. **Monitor Resources** π»
- Watch GPU memory usage
- Monitor training time per epoch
- Balance quality vs. speed
### 4. **Validate Changes** β
- Always check validation metrics
- Compare with baseline performance
- Ensure improvements are consistent
### 5. **Save Everything** πΎ
- Keep all experiment configs
- Save intermediate checkpoints
- Log training curves and metrics
---
**Happy Parameter Tuning! π**
*Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!*
|