DPSGDTool / README.md
Jethro85
Add HF Spaces front matter (sdk: docker)
a012047
---
title: DP-SGD Interactive Playground
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---
# DP-SGD Interactive Playground
An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations.
## 🚀 Recent Improvements (v2.0)
### Enhanced Chart Visualization
- **Clearer dual-axis charts**: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line)
- **Better scaling**: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss)
- **Enhanced tooltips**: More informative hover information with better formatting
- **Visual differentiation**: Added point styles, line weights, and backgrounds for clarity
### Realistic DP-SGD Training Data
- **Research-based accuracy ranges**:
- ε=1: 60-72% accuracy (high privacy)
- ε=2-3: 75-85% accuracy (balanced)
- ε=8: 85-90% accuracy (lower privacy)
- **Consistent training progress**: Final metrics now match training chart progression
- **Realistic learning curves**: Exponential improvement with noise-dependent variation
- **Proper privacy degradation**: Higher noise multipliers significantly impact performance
### Improved Parameter Recommendations
- **Noise multiplier guidance**: Optimal range σ = 0.8-1.5 for good trade-offs
- **Batch size recommendations**: ≥128 for DP-SGD stability
- **Learning rate advice**: ≤0.02 for noisy training environments
- **Epochs guidance**: 8-20 epochs for good convergence vs privacy cost
### Dynamic Privacy-Utility Display
- **Real-time privacy budget**: Shows calculated ε values based on actual parameters
- **Context-aware assessments**: Different recommendations based on achieved accuracy
- **Educational messaging**: Helps users understand what constitutes good/poor trade-offs
## Features
- **Interactive Parameter Tuning**: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs
- **Real-time Training**: Choose between mock simulation or actual MNIST training
- **Multiple Visualizations**:
- Training progress (accuracy/loss over epochs/iterations)
- Gradient clipping visualization
- Privacy budget tracking
- **Smart Recommendations**: Get suggestions for improving your privacy-utility trade-off
- **Educational Content**: Learn about DP-SGD concepts through interactive exploration
## Quick Start
### Prerequisites
- Python 3.8+
- pip or conda
### Installation
1. Clone the repository:
```bash
git clone <repository-url>
cd DPSGD
```
2. Install dependencies:
```bash
pip install -r requirements.txt
```
3. Run the application:
```bash
python3 run.py
```
4. Open your browser and navigate to `http://127.0.0.1:5000`
### Using the Application
1. **Set Parameters**: Use the sliders to adjust DP-SGD parameters
2. **Choose Training Mode**: Select between mock simulation (fast) or real MNIST training
3. **Run Training**: Click "Run Training" to see results
4. **Analyze Results**:
- View training progress in the interactive charts
- Check final metrics (accuracy, loss, privacy budget)
- Read personalized recommendations
5. **Experiment**: Try the "Use Optimal Parameters" button for research-backed settings
## Understanding the Results
### Chart Interpretation
- **Green solid line**: Model accuracy (left y-axis, 0-100%)
- **Red dashed line**: Training loss (right y-axis, 0-3)
- **Privacy Budget (ε)**: Lower values = stronger privacy protection
- **Consistent metrics**: Training progress matches final results
### Recommended Parameter Ranges
- **Clipping Norm (C)**: 1.0-2.0 (balance between privacy and utility)
- **Noise Multiplier (σ)**: 0.8-1.5 (avoid σ > 2.0 for usable models)
- **Batch Size**: 128+ (larger batches help with DP-SGD stability)
- **Learning Rate**: 0.01-0.02 (conservative rates work better with noise)
- **Epochs**: 8-20 (balance convergence vs privacy cost)
### Privacy-Utility Trade-offs
- **ε < 1**: Very strong privacy, expect 60-70% accuracy
- **ε = 2-4**: Good privacy-utility balance, expect 75-85% accuracy
- **ε > 8**: Weaker privacy, expect 85-90% accuracy
## Technical Details
### Architecture
- **Backend**: Flask with TensorFlow/Keras for real training
- **Frontend**: Vanilla JavaScript with Chart.js for visualizations
- **Training**: Supports both mock simulation and real DP-SGD with MNIST
### Algorithms
- **Real Training**: Implements simplified DP-SGD with gradient clipping and Gaussian noise
- **Mock Training**: Research-based simulation reflecting actual DP-SGD behavior patterns
- **Privacy Calculation**: RDP-based privacy budget estimation
### Research Basis
The simulation parameters and accuracy ranges are based on recent DP-SGD research:
- "TAN without a burn: Scaling Laws of DP-SGD" (2023)
- "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022)
- "Differentially Private Generation of Small Images" (2020)
## Contributing
We welcome contributions! Areas for improvement:
- Additional datasets beyond MNIST
- More sophisticated privacy accounting methods
- Enhanced visualizations
- Better mobile responsiveness
## License
This project is licensed under the MIT License - see the LICENSE file for details.
## Acknowledgments
- TensorFlow Privacy team for DP-SGD implementation
- Research community for privacy-preserving ML advances
- Chart.js for excellent visualization capabilities