File size: 5,579 Bytes
a012047
 
 
 
 
 
 
 
 
e3e63bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6640531
 
 
e3e63bf
 
 
 
 
 
 
 
b0b2c21
6640531
 
e3e63bf
 
 
6640531
e3e63bf
6640531
e3e63bf
b0b2c21
e3e63bf
 
b0b2c21
 
6640531
e3e63bf
 
6640531
 
e3e63bf
 
 
 
b0b2c21
e3e63bf
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b0b2c21
e3e63bf
b0b2c21
e3e63bf
b0b2c21
e3e63bf
6640531
e3e63bf
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
---
title: DP-SGD Interactive Playground
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---

# DP-SGD Interactive Playground

An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations.

## 🚀 Recent Improvements (v2.0)

### Enhanced Chart Visualization
- **Clearer dual-axis charts**: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line)
- **Better scaling**: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss)
- **Enhanced tooltips**: More informative hover information with better formatting
- **Visual differentiation**: Added point styles, line weights, and backgrounds for clarity

### Realistic DP-SGD Training Data
- **Research-based accuracy ranges**: 
  - ε=1: 60-72% accuracy (high privacy)
  - ε=2-3: 75-85% accuracy (balanced)
  - ε=8: 85-90% accuracy (lower privacy)
- **Consistent training progress**: Final metrics now match training chart progression
- **Realistic learning curves**: Exponential improvement with noise-dependent variation
- **Proper privacy degradation**: Higher noise multipliers significantly impact performance

### Improved Parameter Recommendations
- **Noise multiplier guidance**: Optimal range σ = 0.8-1.5 for good trade-offs
- **Batch size recommendations**: ≥128 for DP-SGD stability
- **Learning rate advice**: ≤0.02 for noisy training environments
- **Epochs guidance**: 8-20 epochs for good convergence vs privacy cost

### Dynamic Privacy-Utility Display
- **Real-time privacy budget**: Shows calculated ε values based on actual parameters
- **Context-aware assessments**: Different recommendations based on achieved accuracy
- **Educational messaging**: Helps users understand what constitutes good/poor trade-offs

## Features

- **Interactive Parameter Tuning**: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs
- **Real-time Training**: Choose between mock simulation or actual MNIST training
- **Multiple Visualizations**:
  - Training progress (accuracy/loss over epochs/iterations)
  - Gradient clipping visualization
  - Privacy budget tracking
- **Smart Recommendations**: Get suggestions for improving your privacy-utility trade-off
- **Educational Content**: Learn about DP-SGD concepts through interactive exploration

## Quick Start

### Prerequisites
- Python 3.8+
- pip or conda

### Installation

1. Clone the repository:
```bash
git clone <repository-url>
cd DPSGD
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
python3 run.py
```

4. Open your browser and navigate to `http://127.0.0.1:5000`

### Using the Application

1. **Set Parameters**: Use the sliders to adjust DP-SGD parameters
2. **Choose Training Mode**: Select between mock simulation (fast) or real MNIST training
3. **Run Training**: Click "Run Training" to see results
4. **Analyze Results**: 
   - View training progress in the interactive charts
   - Check final metrics (accuracy, loss, privacy budget)
   - Read personalized recommendations
5. **Experiment**: Try the "Use Optimal Parameters" button for research-backed settings

## Understanding the Results

### Chart Interpretation
- **Green solid line**: Model accuracy (left y-axis, 0-100%)
- **Red dashed line**: Training loss (right y-axis, 0-3)
- **Privacy Budget (ε)**: Lower values = stronger privacy protection
- **Consistent metrics**: Training progress matches final results

### Recommended Parameter Ranges
- **Clipping Norm (C)**: 1.0-2.0 (balance between privacy and utility)
- **Noise Multiplier (σ)**: 0.8-1.5 (avoid σ > 2.0 for usable models)
- **Batch Size**: 128+ (larger batches help with DP-SGD stability)
- **Learning Rate**: 0.01-0.02 (conservative rates work better with noise)
- **Epochs**: 8-20 (balance convergence vs privacy cost)

### Privacy-Utility Trade-offs
- **ε < 1**: Very strong privacy, expect 60-70% accuracy
- **ε = 2-4**: Good privacy-utility balance, expect 75-85% accuracy  
- **ε > 8**: Weaker privacy, expect 85-90% accuracy

## Technical Details

### Architecture
- **Backend**: Flask with TensorFlow/Keras for real training
- **Frontend**: Vanilla JavaScript with Chart.js for visualizations
- **Training**: Supports both mock simulation and real DP-SGD with MNIST

### Algorithms
- **Real Training**: Implements simplified DP-SGD with gradient clipping and Gaussian noise
- **Mock Training**: Research-based simulation reflecting actual DP-SGD behavior patterns
- **Privacy Calculation**: RDP-based privacy budget estimation

### Research Basis
The simulation parameters and accuracy ranges are based on recent DP-SGD research:
- "TAN without a burn: Scaling Laws of DP-SGD" (2023)
- "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022)
- "Differentially Private Generation of Small Images" (2020)

## Contributing

We welcome contributions! Areas for improvement:
- Additional datasets beyond MNIST
- More sophisticated privacy accounting methods
- Enhanced visualizations
- Better mobile responsiveness

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- TensorFlow Privacy team for DP-SGD implementation
- Research community for privacy-preserving ML advances
- Chart.js for excellent visualization capabilities