File size: 9,614 Bytes
8bcf79a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
# 🎯 Dressify Training Parameters Guide

## Overview

The Dressify system provides **comprehensive parameter control** for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.

## πŸ–ΌοΈ ResNet Item Embedder Parameters

### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Backbone Architecture** | `resnet50`, `resnet101` | `resnet50` | Base CNN architecture for feature extraction |
| **Embedding Dimension** | 128-1024 | 512 | Output embedding vector size (must match ViT input) |
| **Use ImageNet Pretrained** | `true`/`false` | `true` | Initialize with ImageNet weights |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in projection head for regularization |

### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 20 | Total training iterations |
| **Batch Size** | 8-128 | 64 | Images per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 1e-3 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-6 to 1e-2 | 1e-4 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.2 | Distance margin for triplet loss |

## 🧠 ViT Outfit Encoder Parameters

### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Embedding Dimension** | 128-1024 | 512 | Input embedding size (must match ResNet output) |
| **Transformer Layers** | 2-12 | 6 | Number of transformer encoder layers |
| **Attention Heads** | 4-16 | 8 | Number of multi-head attention heads |
| **Feed-Forward Multiplier** | 2-8 | 4 | Hidden layer size multiplier |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in transformer layers |

### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 30 | Total training iterations |
| **Batch Size** | 4-64 | 32 | Outfits per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 5e-4 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-4 to 1e-1 | 5e-2 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.3 | Distance margin for triplet loss |

## βš™οΈ Advanced Training Settings

### Hardware Optimization
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Mixed Precision (AMP)** | `true`/`false` | `true` | Use automatic mixed precision for faster training |
| **Channels Last Memory** | `true`/`false` | `true` | Use channels_last format for CUDA optimization |
| **Gradient Clipping** | 0.1-5.0 | 1.0 | Clip gradients to prevent explosion |

### Learning Rate Scheduling
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Warmup Epochs** | 0-10 | 3 | Gradual learning rate increase at start |
| **Learning Rate Scheduler** | `cosine`, `step`, `plateau`, `linear` | `cosine` | LR decay strategy |
| **Early Stopping Patience** | 5-20 | 10 | Stop training if no improvement |

### Training Strategy
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Triplet Mining Strategy** | `semi_hard`, `hardest`, `random` | `semi_hard` | Negative sample selection method |
| **Data Augmentation Level** | `minimal`, `standard`, `aggressive` | `standard` | Image augmentation intensity |
| **Random Seed** | 0-9999 | 42 | Reproducible training results |

## πŸ”¬ Parameter Impact Analysis

### High Impact Parameters (Experiment First)

#### 1. **Learning Rate** 🎯
- **Too High**: Training instability, loss spikes
- **Too Low**: Slow convergence, stuck in local minima
- **Sweet Spot**: 1e-3 for ResNet, 5e-4 for ViT
- **Try**: 1e-4, 1e-3, 5e-3, 1e-2

#### 2. **Batch Size** πŸ“¦
- **Small**: Better generalization, slower training
- **Large**: Faster training, potential overfitting
- **Memory Constraint**: GPU VRAM limits maximum size
- **Try**: 16, 32, 64, 128

#### 3. **Triplet Margin** πŸ“
- **Small**: Easier triplets, faster convergence
- **Large**: Harder triplets, better embeddings
- **Balance**: 0.2-0.3 typically optimal
- **Try**: 0.1, 0.2, 0.3, 0.5

### Medium Impact Parameters

#### 4. **Embedding Dimension** πŸ”’
- **Small**: Faster inference, less expressive
- **Large**: More expressive, slower inference
- **Trade-off**: 512 is good balance
- **Try**: 256, 512, 768, 1024

#### 5. **Transformer Layers** πŸ—οΈ
- **Few**: Faster training, less capacity
- **Many**: More capacity, slower training
- **Sweet Spot**: 4-8 layers
- **Try**: 4, 6, 8, 12

#### 6. **Optimizer Choice** ⚑
- **AdamW**: Best for most cases (default)
- **Adam**: Good alternative
- **SGD**: Better generalization, slower convergence
- **RMSprop**: Alternative to Adam

### Low Impact Parameters (Fine-tune Last)

#### 7. **Weight Decay** πŸ›‘οΈ
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 1e-4 (ResNet), 5e-2 (ViT)

#### 8. **Dropout Rate** πŸ’§
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 0.1 for both models

#### 9. **Attention Heads** πŸ‘οΈ
- **Rule**: Should divide embedding dimension evenly
- **Default**: 8 heads for 512 dimensions
- **Try**: 4, 8, 16

## πŸš€ Recommended Parameter Combinations

### Quick Experimentation
```yaml
# Fast Training (Low Quality)
resnet_epochs: 5
vit_epochs: 10
batch_size: 16
learning_rate: 1e-3
```

### Balanced Training
```yaml
# Standard Quality (Default)
resnet_epochs: 20
vit_epochs: 30
batch_size: 64
learning_rate: 1e-3
triplet_margin: 0.2
```

### High Quality Training
```yaml
# High Quality (Longer Training)
resnet_epochs: 50
vit_epochs: 100
batch_size: 32
learning_rate: 5e-4
triplet_margin: 0.3
warmup_epochs: 5
```

### Research Experiments
```yaml
# Research Configuration
resnet_backbone: resnet101
embedding_dim: 768
transformer_layers: 8
attention_heads: 12
mining_strategy: hardest
augmentation_level: aggressive
```

## πŸ“Š Parameter Tuning Workflow

### 1. **Baseline Training** πŸ“ˆ
```bash
# Start with default parameters
./scripts/train_item.sh
./scripts/train_outfit.sh
```

### 2. **Learning Rate Sweep** πŸ”
```yaml
# Test different learning rates
learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
epochs: 5  # Quick test
```

### 3. **Architecture Search** πŸ—οΈ
```yaml
# Test different model sizes
embedding_dims: [256, 512, 768, 1024]
transformer_layers: [4, 6, 8, 12]
```

### 4. **Training Strategy** 🎯
```yaml
# Test different strategies
mining_strategies: [random, semi_hard, hardest]
augmentation_levels: [minimal, standard, aggressive]
```

### 5. **Hyperparameter Optimization** ⚑
```yaml
# Fine-tune best combinations
learning_rate: [4e-4, 5e-4, 6e-4]
batch_size: [24, 32, 40]
triplet_margin: [0.25, 0.3, 0.35]
```

## πŸ§ͺ Monitoring Training Progress

### Key Metrics to Watch
1. **Training Loss**: Should decrease steadily
2. **Validation Loss**: Should decrease without overfitting
3. **Triplet Accuracy**: Should increase over time
4. **Embedding Quality**: Check with t-SNE visualization

### Early Stopping Signs
- Loss plateaus for 5+ epochs
- Validation loss increases while training loss decreases
- Triplet accuracy stops improving

### Success Indicators
- Smooth loss curves
- Consistent improvement in metrics
- Good generalization (validation β‰ˆ training)

## πŸ”§ Advanced Parameter Combinations

### Memory-Constrained Training
```yaml
# For limited GPU memory
batch_size: 16
embedding_dim: 256
transformer_layers: 4
use_mixed_precision: true
channels_last: true
```

### High-Speed Training
```yaml
# For quick iterations
epochs: 10
batch_size: 128
learning_rate: 2e-3
warmup_epochs: 1
early_stopping_patience: 5
```

### Maximum Quality Training
```yaml
# For production models
epochs: 100
batch_size: 32
learning_rate: 1e-4
warmup_epochs: 10
early_stopping_patience: 20
mining_strategy: hardest
augmentation_level: aggressive
```

## πŸ“ Parameter Logging

### Save Your Experiments
```python
# Each training run saves:
# - Custom config JSON
# - Training metrics
# - Model checkpoints
# - Training logs
```

### Track Changes
```yaml
# Document parameter changes:
experiment_001:
  changes: "Increased embedding_dim from 512 to 768"
  results: "Better triplet accuracy, slower training"
  next_steps: "Try reducing learning rate"

experiment_002:
  changes: "Changed mining_strategy to hardest"
  results: "Harder training, better embeddings"
  next_steps: "Increase triplet_margin"
```

## 🎯 Pro Tips

### 1. **Start Simple** πŸš€
- Begin with default parameters
- Change one parameter at a time
- Document every change

### 2. **Use Quick Training** ⚑
- Test parameters with 1-5 epochs first
- Validate promising combinations with full training
- Save time on bad parameter combinations

### 3. **Monitor Resources** πŸ’»
- Watch GPU memory usage
- Monitor training time per epoch
- Balance quality vs. speed

### 4. **Validate Changes** βœ…
- Always check validation metrics
- Compare with baseline performance
- Ensure improvements are consistent

### 5. **Save Everything** πŸ’Ύ
- Keep all experiment configs
- Save intermediate checkpoints
- Log training curves and metrics

---

**Happy Parameter Tuning! πŸŽ‰**

*Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!*