DPSGDTool / README.md
Jethro85
Add HF Spaces front matter (sdk: docker)
a012047
metadata
title: DP-SGD Interactive Playground
emoji: 🛡️
colorFrom: blue
colorTo: purple
sdk: docker
pinned: false

DP-SGD Interactive Playground

An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations.

🚀 Recent Improvements (v2.0)

Enhanced Chart Visualization

  • Clearer dual-axis charts: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line)
  • Better scaling: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss)
  • Enhanced tooltips: More informative hover information with better formatting
  • Visual differentiation: Added point styles, line weights, and backgrounds for clarity

Realistic DP-SGD Training Data

  • Research-based accuracy ranges:
    • ε=1: 60-72% accuracy (high privacy)
    • ε=2-3: 75-85% accuracy (balanced)
    • ε=8: 85-90% accuracy (lower privacy)
  • Consistent training progress: Final metrics now match training chart progression
  • Realistic learning curves: Exponential improvement with noise-dependent variation
  • Proper privacy degradation: Higher noise multipliers significantly impact performance

Improved Parameter Recommendations

  • Noise multiplier guidance: Optimal range σ = 0.8-1.5 for good trade-offs
  • Batch size recommendations: ≥128 for DP-SGD stability
  • Learning rate advice: ≤0.02 for noisy training environments
  • Epochs guidance: 8-20 epochs for good convergence vs privacy cost

Dynamic Privacy-Utility Display

  • Real-time privacy budget: Shows calculated ε values based on actual parameters
  • Context-aware assessments: Different recommendations based on achieved accuracy
  • Educational messaging: Helps users understand what constitutes good/poor trade-offs

Features

  • Interactive Parameter Tuning: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs
  • Real-time Training: Choose between mock simulation or actual MNIST training
  • Multiple Visualizations:
    • Training progress (accuracy/loss over epochs/iterations)
    • Gradient clipping visualization
    • Privacy budget tracking
  • Smart Recommendations: Get suggestions for improving your privacy-utility trade-off
  • Educational Content: Learn about DP-SGD concepts through interactive exploration

Quick Start

Prerequisites

  • Python 3.8+
  • pip or conda

Installation

  1. Clone the repository:
git clone <repository-url>
cd DPSGD
  1. Install dependencies:
pip install -r requirements.txt
  1. Run the application:
python3 run.py
  1. Open your browser and navigate to http://127.0.0.1:5000

Using the Application

  1. Set Parameters: Use the sliders to adjust DP-SGD parameters
  2. Choose Training Mode: Select between mock simulation (fast) or real MNIST training
  3. Run Training: Click "Run Training" to see results
  4. Analyze Results:
    • View training progress in the interactive charts
    • Check final metrics (accuracy, loss, privacy budget)
    • Read personalized recommendations
  5. Experiment: Try the "Use Optimal Parameters" button for research-backed settings

Understanding the Results

Chart Interpretation

  • Green solid line: Model accuracy (left y-axis, 0-100%)
  • Red dashed line: Training loss (right y-axis, 0-3)
  • Privacy Budget (ε): Lower values = stronger privacy protection
  • Consistent metrics: Training progress matches final results

Recommended Parameter Ranges

  • Clipping Norm (C): 1.0-2.0 (balance between privacy and utility)
  • Noise Multiplier (σ): 0.8-1.5 (avoid σ > 2.0 for usable models)
  • Batch Size: 128+ (larger batches help with DP-SGD stability)
  • Learning Rate: 0.01-0.02 (conservative rates work better with noise)
  • Epochs: 8-20 (balance convergence vs privacy cost)

Privacy-Utility Trade-offs

  • ε < 1: Very strong privacy, expect 60-70% accuracy
  • ε = 2-4: Good privacy-utility balance, expect 75-85% accuracy
  • ε > 8: Weaker privacy, expect 85-90% accuracy

Technical Details

Architecture

  • Backend: Flask with TensorFlow/Keras for real training
  • Frontend: Vanilla JavaScript with Chart.js for visualizations
  • Training: Supports both mock simulation and real DP-SGD with MNIST

Algorithms

  • Real Training: Implements simplified DP-SGD with gradient clipping and Gaussian noise
  • Mock Training: Research-based simulation reflecting actual DP-SGD behavior patterns
  • Privacy Calculation: RDP-based privacy budget estimation

Research Basis

The simulation parameters and accuracy ranges are based on recent DP-SGD research:

  • "TAN without a burn: Scaling Laws of DP-SGD" (2023)
  • "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022)
  • "Differentially Private Generation of Small Images" (2020)

Contributing

We welcome contributions! Areas for improvement:

  • Additional datasets beyond MNIST
  • More sophisticated privacy accounting methods
  • Enhanced visualizations
  • Better mobile responsiveness

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

  • TensorFlow Privacy team for DP-SGD implementation
  • Research community for privacy-preserving ML advances
  • Chart.js for excellent visualization capabilities