recomendation / TRAINING_PARAMETERS.md
Ali Mohsin
more try
8bcf79a
# 🎯 Dressify Training Parameters Guide
## Overview
The Dressify system provides **comprehensive parameter control** for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.
## πŸ–ΌοΈ ResNet Item Embedder Parameters
### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Backbone Architecture** | `resnet50`, `resnet101` | `resnet50` | Base CNN architecture for feature extraction |
| **Embedding Dimension** | 128-1024 | 512 | Output embedding vector size (must match ViT input) |
| **Use ImageNet Pretrained** | `true`/`false` | `true` | Initialize with ImageNet weights |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in projection head for regularization |
### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 20 | Total training iterations |
| **Batch Size** | 8-128 | 64 | Images per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 1e-3 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-6 to 1e-2 | 1e-4 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.2 | Distance margin for triplet loss |
## 🧠 ViT Outfit Encoder Parameters
### Model Architecture
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Embedding Dimension** | 128-1024 | 512 | Input embedding size (must match ResNet output) |
| **Transformer Layers** | 2-12 | 6 | Number of transformer encoder layers |
| **Attention Heads** | 4-16 | 8 | Number of multi-head attention heads |
| **Feed-Forward Multiplier** | 2-8 | 4 | Hidden layer size multiplier |
| **Dropout Rate** | 0.0-0.5 | 0.1 | Dropout in transformer layers |
### Training Parameters
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Epochs** | 1-100 | 30 | Total training iterations |
| **Batch Size** | 4-64 | 32 | Outfits per training batch |
| **Learning Rate** | 1e-5 to 1e-2 | 5e-4 | Step size for gradient descent |
| **Optimizer** | `adamw`, `adam`, `sgd`, `rmsprop` | `adamw` | Optimization algorithm |
| **Weight Decay** | 1e-4 to 1e-1 | 5e-2 | L2 regularization strength |
| **Triplet Margin** | 0.1-1.0 | 0.3 | Distance margin for triplet loss |
## βš™οΈ Advanced Training Settings
### Hardware Optimization
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Mixed Precision (AMP)** | `true`/`false` | `true` | Use automatic mixed precision for faster training |
| **Channels Last Memory** | `true`/`false` | `true` | Use channels_last format for CUDA optimization |
| **Gradient Clipping** | 0.1-5.0 | 1.0 | Clip gradients to prevent explosion |
### Learning Rate Scheduling
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Warmup Epochs** | 0-10 | 3 | Gradual learning rate increase at start |
| **Learning Rate Scheduler** | `cosine`, `step`, `plateau`, `linear` | `cosine` | LR decay strategy |
| **Early Stopping Patience** | 5-20 | 10 | Stop training if no improvement |
### Training Strategy
| Parameter | Range | Default | Description |
|-----------|-------|---------|-------------|
| **Triplet Mining Strategy** | `semi_hard`, `hardest`, `random` | `semi_hard` | Negative sample selection method |
| **Data Augmentation Level** | `minimal`, `standard`, `aggressive` | `standard` | Image augmentation intensity |
| **Random Seed** | 0-9999 | 42 | Reproducible training results |
## πŸ”¬ Parameter Impact Analysis
### High Impact Parameters (Experiment First)
#### 1. **Learning Rate** 🎯
- **Too High**: Training instability, loss spikes
- **Too Low**: Slow convergence, stuck in local minima
- **Sweet Spot**: 1e-3 for ResNet, 5e-4 for ViT
- **Try**: 1e-4, 1e-3, 5e-3, 1e-2
#### 2. **Batch Size** πŸ“¦
- **Small**: Better generalization, slower training
- **Large**: Faster training, potential overfitting
- **Memory Constraint**: GPU VRAM limits maximum size
- **Try**: 16, 32, 64, 128
#### 3. **Triplet Margin** πŸ“
- **Small**: Easier triplets, faster convergence
- **Large**: Harder triplets, better embeddings
- **Balance**: 0.2-0.3 typically optimal
- **Try**: 0.1, 0.2, 0.3, 0.5
### Medium Impact Parameters
#### 4. **Embedding Dimension** πŸ”’
- **Small**: Faster inference, less expressive
- **Large**: More expressive, slower inference
- **Trade-off**: 512 is good balance
- **Try**: 256, 512, 768, 1024
#### 5. **Transformer Layers** πŸ—οΈ
- **Few**: Faster training, less capacity
- **Many**: More capacity, slower training
- **Sweet Spot**: 4-8 layers
- **Try**: 4, 6, 8, 12
#### 6. **Optimizer Choice** ⚑
- **AdamW**: Best for most cases (default)
- **Adam**: Good alternative
- **SGD**: Better generalization, slower convergence
- **RMSprop**: Alternative to Adam
### Low Impact Parameters (Fine-tune Last)
#### 7. **Weight Decay** πŸ›‘οΈ
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 1e-4 (ResNet), 5e-2 (ViT)
#### 8. **Dropout Rate** πŸ’§
- **Small**: Less regularization
- **Large**: More regularization
- **Default**: 0.1 for both models
#### 9. **Attention Heads** πŸ‘οΈ
- **Rule**: Should divide embedding dimension evenly
- **Default**: 8 heads for 512 dimensions
- **Try**: 4, 8, 16
## πŸš€ Recommended Parameter Combinations
### Quick Experimentation
```yaml
# Fast Training (Low Quality)
resnet_epochs: 5
vit_epochs: 10
batch_size: 16
learning_rate: 1e-3
```
### Balanced Training
```yaml
# Standard Quality (Default)
resnet_epochs: 20
vit_epochs: 30
batch_size: 64
learning_rate: 1e-3
triplet_margin: 0.2
```
### High Quality Training
```yaml
# High Quality (Longer Training)
resnet_epochs: 50
vit_epochs: 100
batch_size: 32
learning_rate: 5e-4
triplet_margin: 0.3
warmup_epochs: 5
```
### Research Experiments
```yaml
# Research Configuration
resnet_backbone: resnet101
embedding_dim: 768
transformer_layers: 8
attention_heads: 12
mining_strategy: hardest
augmentation_level: aggressive
```
## πŸ“Š Parameter Tuning Workflow
### 1. **Baseline Training** πŸ“ˆ
```bash
# Start with default parameters
./scripts/train_item.sh
./scripts/train_outfit.sh
```
### 2. **Learning Rate Sweep** πŸ”
```yaml
# Test different learning rates
learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
epochs: 5 # Quick test
```
### 3. **Architecture Search** πŸ—οΈ
```yaml
# Test different model sizes
embedding_dims: [256, 512, 768, 1024]
transformer_layers: [4, 6, 8, 12]
```
### 4. **Training Strategy** 🎯
```yaml
# Test different strategies
mining_strategies: [random, semi_hard, hardest]
augmentation_levels: [minimal, standard, aggressive]
```
### 5. **Hyperparameter Optimization** ⚑
```yaml
# Fine-tune best combinations
learning_rate: [4e-4, 5e-4, 6e-4]
batch_size: [24, 32, 40]
triplet_margin: [0.25, 0.3, 0.35]
```
## πŸ§ͺ Monitoring Training Progress
### Key Metrics to Watch
1. **Training Loss**: Should decrease steadily
2. **Validation Loss**: Should decrease without overfitting
3. **Triplet Accuracy**: Should increase over time
4. **Embedding Quality**: Check with t-SNE visualization
### Early Stopping Signs
- Loss plateaus for 5+ epochs
- Validation loss increases while training loss decreases
- Triplet accuracy stops improving
### Success Indicators
- Smooth loss curves
- Consistent improvement in metrics
- Good generalization (validation β‰ˆ training)
## πŸ”§ Advanced Parameter Combinations
### Memory-Constrained Training
```yaml
# For limited GPU memory
batch_size: 16
embedding_dim: 256
transformer_layers: 4
use_mixed_precision: true
channels_last: true
```
### High-Speed Training
```yaml
# For quick iterations
epochs: 10
batch_size: 128
learning_rate: 2e-3
warmup_epochs: 1
early_stopping_patience: 5
```
### Maximum Quality Training
```yaml
# For production models
epochs: 100
batch_size: 32
learning_rate: 1e-4
warmup_epochs: 10
early_stopping_patience: 20
mining_strategy: hardest
augmentation_level: aggressive
```
## πŸ“ Parameter Logging
### Save Your Experiments
```python
# Each training run saves:
# - Custom config JSON
# - Training metrics
# - Model checkpoints
# - Training logs
```
### Track Changes
```yaml
# Document parameter changes:
experiment_001:
changes: "Increased embedding_dim from 512 to 768"
results: "Better triplet accuracy, slower training"
next_steps: "Try reducing learning rate"
experiment_002:
changes: "Changed mining_strategy to hardest"
results: "Harder training, better embeddings"
next_steps: "Increase triplet_margin"
```
## 🎯 Pro Tips
### 1. **Start Simple** πŸš€
- Begin with default parameters
- Change one parameter at a time
- Document every change
### 2. **Use Quick Training** ⚑
- Test parameters with 1-5 epochs first
- Validate promising combinations with full training
- Save time on bad parameter combinations
### 3. **Monitor Resources** πŸ’»
- Watch GPU memory usage
- Monitor training time per epoch
- Balance quality vs. speed
### 4. **Validate Changes** βœ…
- Always check validation metrics
- Compare with baseline performance
- Ensure improvements are consistent
### 5. **Save Everything** πŸ’Ύ
- Keep all experiment configs
- Save intermediate checkpoints
- Log training curves and metrics
---
**Happy Parameter Tuning! πŸŽ‰**
*Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!*