Spaces:

Stylique
/

recomendation

Sleeping

App Files Files Community

recomendation / TRAINING_PARAMETERS.md

Ali Mohsin

more try

8bcf79a 3 months ago

preview code

raw

history blame contribute delete

9.61 kB

	# 🎯 Dressify Training Parameters Guide

	## Overview

	The Dressify system provides comprehensive parameter control for both ResNet item embedder and ViT outfit encoder training. This guide covers all the "knobs" you can tweak to experiment with different training configurations.

	## 🖼️ ResNet Item Embedder Parameters

	### Model Architecture
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Backbone Architecture \| `resnet50`, `resnet101` \| `resnet50` \| Base CNN architecture for feature extraction \|
	\| Embedding Dimension \| 128-1024 \| 512 \| Output embedding vector size (must match ViT input) \|
	\| Use ImageNet Pretrained \| `true`/`false` \| `true` \| Initialize with ImageNet weights \|
	\| Dropout Rate \| 0.0-0.5 \| 0.1 \| Dropout in projection head for regularization \|

	### Training Parameters
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Epochs \| 1-100 \| 20 \| Total training iterations \|
	\| Batch Size \| 8-128 \| 64 \| Images per training batch \|
	\| Learning Rate \| 1e-5 to 1e-2 \| 1e-3 \| Step size for gradient descent \|
	\| Optimizer \| `adamw`, `adam`, `sgd`, `rmsprop` \| `adamw` \| Optimization algorithm \|
	\| Weight Decay \| 1e-6 to 1e-2 \| 1e-4 \| L2 regularization strength \|
	\| Triplet Margin \| 0.1-1.0 \| 0.2 \| Distance margin for triplet loss \|

	## 🧠 ViT Outfit Encoder Parameters

	### Model Architecture
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Embedding Dimension \| 128-1024 \| 512 \| Input embedding size (must match ResNet output) \|
	\| Transformer Layers \| 2-12 \| 6 \| Number of transformer encoder layers \|
	\| Attention Heads \| 4-16 \| 8 \| Number of multi-head attention heads \|
	\| Feed-Forward Multiplier \| 2-8 \| 4 \| Hidden layer size multiplier \|
	\| Dropout Rate \| 0.0-0.5 \| 0.1 \| Dropout in transformer layers \|

	### Training Parameters
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Epochs \| 1-100 \| 30 \| Total training iterations \|
	\| Batch Size \| 4-64 \| 32 \| Outfits per training batch \|
	\| Learning Rate \| 1e-5 to 1e-2 \| 5e-4 \| Step size for gradient descent \|
	\| Optimizer \| `adamw`, `adam`, `sgd`, `rmsprop` \| `adamw` \| Optimization algorithm \|
	\| Weight Decay \| 1e-4 to 1e-1 \| 5e-2 \| L2 regularization strength \|
	\| Triplet Margin \| 0.1-1.0 \| 0.3 \| Distance margin for triplet loss \|

	## ⚙️ Advanced Training Settings

	### Hardware Optimization
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Mixed Precision (AMP) \| `true`/`false` \| `true` \| Use automatic mixed precision for faster training \|
	\| Channels Last Memory \| `true`/`false` \| `true` \| Use channels_last format for CUDA optimization \|
	\| Gradient Clipping \| 0.1-5.0 \| 1.0 \| Clip gradients to prevent explosion \|

	### Learning Rate Scheduling
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Warmup Epochs \| 0-10 \| 3 \| Gradual learning rate increase at start \|
	\| Learning Rate Scheduler \| `cosine`, `step`, `plateau`, `linear` \| `cosine` \| LR decay strategy \|
	\| Early Stopping Patience \| 5-20 \| 10 \| Stop training if no improvement \|

	### Training Strategy
	\| Parameter \| Range \| Default \| Description \|
	\|-----------\|-------\|---------\|-------------\|
	\| Triplet Mining Strategy \| `semi_hard`, `hardest`, `random` \| `semi_hard` \| Negative sample selection method \|
	\| Data Augmentation Level \| `minimal`, `standard`, `aggressive` \| `standard` \| Image augmentation intensity \|
	\| Random Seed \| 0-9999 \| 42 \| Reproducible training results \|

	## 🔬 Parameter Impact Analysis

	### High Impact Parameters (Experiment First)

	#### 1. Learning Rate 🎯
	- Too High: Training instability, loss spikes
	- Too Low: Slow convergence, stuck in local minima
	- Sweet Spot: 1e-3 for ResNet, 5e-4 for ViT
	- Try: 1e-4, 1e-3, 5e-3, 1e-2

	#### 2. Batch Size 📦
	- Small: Better generalization, slower training
	- Large: Faster training, potential overfitting
	- Memory Constraint: GPU VRAM limits maximum size
	- Try: 16, 32, 64, 128

	#### 3. Triplet Margin 📏
	- Small: Easier triplets, faster convergence
	- Large: Harder triplets, better embeddings
	- Balance: 0.2-0.3 typically optimal
	- Try: 0.1, 0.2, 0.3, 0.5

	### Medium Impact Parameters

	#### 4. Embedding Dimension 🔢
	- Small: Faster inference, less expressive
	- Large: More expressive, slower inference
	- Trade-off: 512 is good balance
	- Try: 256, 512, 768, 1024

	#### 5. Transformer Layers 🏗️
	- Few: Faster training, less capacity
	- Many: More capacity, slower training
	- Sweet Spot: 4-8 layers
	- Try: 4, 6, 8, 12

	#### 6. Optimizer Choice ⚡
	- AdamW: Best for most cases (default)
	- Adam: Good alternative
	- SGD: Better generalization, slower convergence
	- RMSprop: Alternative to Adam

	### Low Impact Parameters (Fine-tune Last)

	#### 7. Weight Decay 🛡️
	- Small: Less regularization
	- Large: More regularization
	- Default: 1e-4 (ResNet), 5e-2 (ViT)

	#### 8. Dropout Rate 💧
	- Small: Less regularization
	- Large: More regularization
	- Default: 0.1 for both models

	#### 9. Attention Heads 👁️
	- Rule: Should divide embedding dimension evenly
	- Default: 8 heads for 512 dimensions
	- Try: 4, 8, 16

	## 🚀 Recommended Parameter Combinations

	### Quick Experimentation
	```yaml
	# Fast Training (Low Quality)
	resnet_epochs: 5
	vit_epochs: 10
	batch_size: 16
	learning_rate: 1e-3
	```

	### Balanced Training
	```yaml
	# Standard Quality (Default)
	resnet_epochs: 20
	vit_epochs: 30
	batch_size: 64
	learning_rate: 1e-3
	triplet_margin: 0.2
	```

	### High Quality Training
	```yaml
	# High Quality (Longer Training)
	resnet_epochs: 50
	vit_epochs: 100
	batch_size: 32
	learning_rate: 5e-4
	triplet_margin: 0.3
	warmup_epochs: 5
	```

	### Research Experiments
	```yaml
	# Research Configuration
	resnet_backbone: resnet101
	embedding_dim: 768
	transformer_layers: 8
	attention_heads: 12
	mining_strategy: hardest
	augmentation_level: aggressive
	```

	## 📊 Parameter Tuning Workflow

	### 1. Baseline Training 📈
	```bash
	# Start with default parameters
	./scripts/train_item.sh
	./scripts/train_outfit.sh
	```

	### 2. Learning Rate Sweep 🔍
	```yaml
	# Test different learning rates
	learning_rates: [1e-4, 5e-4, 1e-3, 5e-3, 1e-2]
	epochs: 5 # Quick test
	```

	### 3. Architecture Search 🏗️
	```yaml
	# Test different model sizes
	embedding_dims: [256, 512, 768, 1024]
	transformer_layers: [4, 6, 8, 12]
	```

	### 4. Training Strategy 🎯
	```yaml
	# Test different strategies
	mining_strategies: [random, semi_hard, hardest]
	augmentation_levels: [minimal, standard, aggressive]
	```

	### 5. Hyperparameter Optimization ⚡
	```yaml
	# Fine-tune best combinations
	learning_rate: [4e-4, 5e-4, 6e-4]
	batch_size: [24, 32, 40]
	triplet_margin: [0.25, 0.3, 0.35]
	```

	## 🧪 Monitoring Training Progress

	### Key Metrics to Watch
	1. Training Loss: Should decrease steadily
	2. Validation Loss: Should decrease without overfitting
	3. Triplet Accuracy: Should increase over time
	4. Embedding Quality: Check with t-SNE visualization

	### Early Stopping Signs
	- Loss plateaus for 5+ epochs
	- Validation loss increases while training loss decreases
	- Triplet accuracy stops improving

	### Success Indicators
	- Smooth loss curves
	- Consistent improvement in metrics
	- Good generalization (validation ≈ training)

	## 🔧 Advanced Parameter Combinations

	### Memory-Constrained Training
	```yaml
	# For limited GPU memory
	batch_size: 16
	embedding_dim: 256
	transformer_layers: 4
	use_mixed_precision: true
	channels_last: true
	```

	### High-Speed Training
	```yaml
	# For quick iterations
	epochs: 10
	batch_size: 128
	learning_rate: 2e-3
	warmup_epochs: 1
	early_stopping_patience: 5
	```

	### Maximum Quality Training
	```yaml
	# For production models
	epochs: 100
	batch_size: 32
	learning_rate: 1e-4
	warmup_epochs: 10
	early_stopping_patience: 20
	mining_strategy: hardest
	augmentation_level: aggressive
	```

	## 📝 Parameter Logging

	### Save Your Experiments
	```python
	# Each training run saves:
	# - Custom config JSON
	# - Training metrics
	# - Model checkpoints
	# - Training logs
	```

	### Track Changes
	```yaml
	# Document parameter changes:
	experiment_001:
	changes: "Increased embedding_dim from 512 to 768"
	results: "Better triplet accuracy, slower training"
	next_steps: "Try reducing learning rate"

	experiment_002:
	changes: "Changed mining_strategy to hardest"
	results: "Harder training, better embeddings"
	next_steps: "Increase triplet_margin"
	```

	## 🎯 Pro Tips

	### 1. Start Simple 🚀
	- Begin with default parameters
	- Change one parameter at a time
	- Document every change

	### 2. Use Quick Training ⚡
	- Test parameters with 1-5 epochs first
	- Validate promising combinations with full training
	- Save time on bad parameter combinations

	### 3. Monitor Resources 💻
	- Watch GPU memory usage
	- Monitor training time per epoch
	- Balance quality vs. speed

	### 4. Validate Changes ✅
	- Always check validation metrics
	- Compare with baseline performance
	- Ensure improvements are consistent

	### 5. Save Everything 💾
	- Keep all experiment configs
	- Save intermediate checkpoints
	- Log training curves and metrics

	---

	Happy Parameter Tuning! 🎉

	Remember: The best parameters depend on your specific dataset, hardware, and requirements. Experiment systematically and document everything!