Spaces:

Jethro85
/

DPSGDTool

Sleeping

File size: 5,579 Bytes

---
title: DP-SGD Interactive Playground
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false
---

# DP-SGD Interactive Playground

An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations.

## 🚀 Recent Improvements (v2.0)

### Enhanced Chart Visualization
- **Clearer dual-axis charts**: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line)
- **Better scaling**: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss)
- **Enhanced tooltips**: More informative hover information with better formatting
- **Visual differentiation**: Added point styles, line weights, and backgrounds for clarity

### Realistic DP-SGD Training Data
- **Research-based accuracy ranges**: 
  - ε=1: 60-72% accuracy (high privacy)
  - ε=2-3: 75-85% accuracy (balanced)
  - ε=8: 85-90% accuracy (lower privacy)
- **Consistent training progress**: Final metrics now match training chart progression
- **Realistic learning curves**: Exponential improvement with noise-dependent variation
- **Proper privacy degradation**: Higher noise multipliers significantly impact performance

### Improved Parameter Recommendations
- **Noise multiplier guidance**: Optimal range σ = 0.8-1.5 for good trade-offs
- **Batch size recommendations**: ≥128 for DP-SGD stability
- **Learning rate advice**: ≤0.02 for noisy training environments
- **Epochs guidance**: 8-20 epochs for good convergence vs privacy cost

### Dynamic Privacy-Utility Display
- **Real-time privacy budget**: Shows calculated ε values based on actual parameters
- **Context-aware assessments**: Different recommendations based on achieved accuracy
- **Educational messaging**: Helps users understand what constitutes good/poor trade-offs

## Features

- **Interactive Parameter Tuning**: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs
- **Real-time Training**: Choose between mock simulation or actual MNIST training
- **Multiple Visualizations**:
  - Training progress (accuracy/loss over epochs/iterations)
  - Gradient clipping visualization
  - Privacy budget tracking
- **Smart Recommendations**: Get suggestions for improving your privacy-utility trade-off
- **Educational Content**: Learn about DP-SGD concepts through interactive exploration

## Quick Start

### Prerequisites
- Python 3.8+
- pip or conda

### Installation

1. Clone the repository:
```bash
git clone <repository-url>
cd DPSGD
```

2. Install dependencies:
```bash
pip install -r requirements.txt
```

3. Run the application:
```bash
python3 run.py
```

4. Open your browser and navigate to `http://127.0.0.1:5000`

### Using the Application

1. **Set Parameters**: Use the sliders to adjust DP-SGD parameters
2. **Choose Training Mode**: Select between mock simulation (fast) or real MNIST training
3. **Run Training**: Click "Run Training" to see results
4. **Analyze Results**: 
   - View training progress in the interactive charts
   - Check final metrics (accuracy, loss, privacy budget)
   - Read personalized recommendations
5. **Experiment**: Try the "Use Optimal Parameters" button for research-backed settings

## Understanding the Results

### Chart Interpretation
- **Green solid line**: Model accuracy (left y-axis, 0-100%)
- **Red dashed line**: Training loss (right y-axis, 0-3)
- **Privacy Budget (ε)**: Lower values = stronger privacy protection
- **Consistent metrics**: Training progress matches final results

### Recommended Parameter Ranges
- **Clipping Norm (C)**: 1.0-2.0 (balance between privacy and utility)
- **Noise Multiplier (σ)**: 0.8-1.5 (avoid σ > 2.0 for usable models)
- **Batch Size**: 128+ (larger batches help with DP-SGD stability)
- **Learning Rate**: 0.01-0.02 (conservative rates work better with noise)
- **Epochs**: 8-20 (balance convergence vs privacy cost)

### Privacy-Utility Trade-offs
- **ε < 1**: Very strong privacy, expect 60-70% accuracy
- **ε = 2-4**: Good privacy-utility balance, expect 75-85% accuracy  
- **ε > 8**: Weaker privacy, expect 85-90% accuracy

## Technical Details

### Architecture
- **Backend**: Flask with TensorFlow/Keras for real training
- **Frontend**: Vanilla JavaScript with Chart.js for visualizations
- **Training**: Supports both mock simulation and real DP-SGD with MNIST

### Algorithms
- **Real Training**: Implements simplified DP-SGD with gradient clipping and Gaussian noise
- **Mock Training**: Research-based simulation reflecting actual DP-SGD behavior patterns
- **Privacy Calculation**: RDP-based privacy budget estimation

### Research Basis
The simulation parameters and accuracy ranges are based on recent DP-SGD research:
- "TAN without a burn: Scaling Laws of DP-SGD" (2023)
- "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022)
- "Differentially Private Generation of Small Images" (2020)

## Contributing

We welcome contributions! Areas for improvement:
- Additional datasets beyond MNIST
- More sophisticated privacy accounting methods
- Enhanced visualizations
- Better mobile responsiveness

## License

This project is licensed under the MIT License - see the LICENSE file for details.

## Acknowledgments

- TensorFlow Privacy team for DP-SGD implementation
- Research community for privacy-preserving ML advances
- Chart.js for excellent visualization capabilities