|
--- |
|
title: DP-SGD Interactive Playground |
|
emoji: 🛡️ |
|
colorFrom: blue |
|
colorTo: purple |
|
sdk: docker |
|
pinned: false |
|
--- |
|
|
|
# DP-SGD Interactive Playground |
|
|
|
An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations. |
|
|
|
## 🚀 Recent Improvements (v2.0) |
|
|
|
### Enhanced Chart Visualization |
|
- **Clearer dual-axis charts**: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line) |
|
- **Better scaling**: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss) |
|
- **Enhanced tooltips**: More informative hover information with better formatting |
|
- **Visual differentiation**: Added point styles, line weights, and backgrounds for clarity |
|
|
|
### Realistic DP-SGD Training Data |
|
- **Research-based accuracy ranges**: |
|
- ε=1: 60-72% accuracy (high privacy) |
|
- ε=2-3: 75-85% accuracy (balanced) |
|
- ε=8: 85-90% accuracy (lower privacy) |
|
- **Consistent training progress**: Final metrics now match training chart progression |
|
- **Realistic learning curves**: Exponential improvement with noise-dependent variation |
|
- **Proper privacy degradation**: Higher noise multipliers significantly impact performance |
|
|
|
### Improved Parameter Recommendations |
|
- **Noise multiplier guidance**: Optimal range σ = 0.8-1.5 for good trade-offs |
|
- **Batch size recommendations**: ≥128 for DP-SGD stability |
|
- **Learning rate advice**: ≤0.02 for noisy training environments |
|
- **Epochs guidance**: 8-20 epochs for good convergence vs privacy cost |
|
|
|
### Dynamic Privacy-Utility Display |
|
- **Real-time privacy budget**: Shows calculated ε values based on actual parameters |
|
- **Context-aware assessments**: Different recommendations based on achieved accuracy |
|
- **Educational messaging**: Helps users understand what constitutes good/poor trade-offs |
|
|
|
## Features |
|
|
|
- **Interactive Parameter Tuning**: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs |
|
- **Real-time Training**: Choose between mock simulation or actual MNIST training |
|
- **Multiple Visualizations**: |
|
- Training progress (accuracy/loss over epochs/iterations) |
|
- Gradient clipping visualization |
|
- Privacy budget tracking |
|
- **Smart Recommendations**: Get suggestions for improving your privacy-utility trade-off |
|
- **Educational Content**: Learn about DP-SGD concepts through interactive exploration |
|
|
|
## Quick Start |
|
|
|
### Prerequisites |
|
- Python 3.8+ |
|
- pip or conda |
|
|
|
### Installation |
|
|
|
1. Clone the repository: |
|
```bash |
|
git clone <repository-url> |
|
cd DPSGD |
|
``` |
|
|
|
2. Install dependencies: |
|
```bash |
|
pip install -r requirements.txt |
|
``` |
|
|
|
3. Run the application: |
|
```bash |
|
python3 run.py |
|
``` |
|
|
|
4. Open your browser and navigate to `http://127.0.0.1:5000` |
|
|
|
### Using the Application |
|
|
|
1. **Set Parameters**: Use the sliders to adjust DP-SGD parameters |
|
2. **Choose Training Mode**: Select between mock simulation (fast) or real MNIST training |
|
3. **Run Training**: Click "Run Training" to see results |
|
4. **Analyze Results**: |
|
- View training progress in the interactive charts |
|
- Check final metrics (accuracy, loss, privacy budget) |
|
- Read personalized recommendations |
|
5. **Experiment**: Try the "Use Optimal Parameters" button for research-backed settings |
|
|
|
## Understanding the Results |
|
|
|
### Chart Interpretation |
|
- **Green solid line**: Model accuracy (left y-axis, 0-100%) |
|
- **Red dashed line**: Training loss (right y-axis, 0-3) |
|
- **Privacy Budget (ε)**: Lower values = stronger privacy protection |
|
- **Consistent metrics**: Training progress matches final results |
|
|
|
### Recommended Parameter Ranges |
|
- **Clipping Norm (C)**: 1.0-2.0 (balance between privacy and utility) |
|
- **Noise Multiplier (σ)**: 0.8-1.5 (avoid σ > 2.0 for usable models) |
|
- **Batch Size**: 128+ (larger batches help with DP-SGD stability) |
|
- **Learning Rate**: 0.01-0.02 (conservative rates work better with noise) |
|
- **Epochs**: 8-20 (balance convergence vs privacy cost) |
|
|
|
### Privacy-Utility Trade-offs |
|
- **ε < 1**: Very strong privacy, expect 60-70% accuracy |
|
- **ε = 2-4**: Good privacy-utility balance, expect 75-85% accuracy |
|
- **ε > 8**: Weaker privacy, expect 85-90% accuracy |
|
|
|
## Technical Details |
|
|
|
### Architecture |
|
- **Backend**: Flask with TensorFlow/Keras for real training |
|
- **Frontend**: Vanilla JavaScript with Chart.js for visualizations |
|
- **Training**: Supports both mock simulation and real DP-SGD with MNIST |
|
|
|
### Algorithms |
|
- **Real Training**: Implements simplified DP-SGD with gradient clipping and Gaussian noise |
|
- **Mock Training**: Research-based simulation reflecting actual DP-SGD behavior patterns |
|
- **Privacy Calculation**: RDP-based privacy budget estimation |
|
|
|
### Research Basis |
|
The simulation parameters and accuracy ranges are based on recent DP-SGD research: |
|
- "TAN without a burn: Scaling Laws of DP-SGD" (2023) |
|
- "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022) |
|
- "Differentially Private Generation of Small Images" (2020) |
|
|
|
## Contributing |
|
|
|
We welcome contributions! Areas for improvement: |
|
- Additional datasets beyond MNIST |
|
- More sophisticated privacy accounting methods |
|
- Enhanced visualizations |
|
- Better mobile responsiveness |
|
|
|
## License |
|
|
|
This project is licensed under the MIT License - see the LICENSE file for details. |
|
|
|
## Acknowledgments |
|
|
|
- TensorFlow Privacy team for DP-SGD implementation |
|
- Research community for privacy-preserving ML advances |
|
- Chart.js for excellent visualization capabilities |