--- title: DP-SGD Interactive Playground emoji: 🛡️ colorFrom: blue colorTo: purple sdk: docker pinned: false --- # DP-SGD Interactive Playground An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations. ## 🚀 Recent Improvements (v2.0) ### Enhanced Chart Visualization - **Clearer dual-axis charts**: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line) - **Better scaling**: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss) - **Enhanced tooltips**: More informative hover information with better formatting - **Visual differentiation**: Added point styles, line weights, and backgrounds for clarity ### Realistic DP-SGD Training Data - **Research-based accuracy ranges**: - ε=1: 60-72% accuracy (high privacy) - ε=2-3: 75-85% accuracy (balanced) - ε=8: 85-90% accuracy (lower privacy) - **Consistent training progress**: Final metrics now match training chart progression - **Realistic learning curves**: Exponential improvement with noise-dependent variation - **Proper privacy degradation**: Higher noise multipliers significantly impact performance ### Improved Parameter Recommendations - **Noise multiplier guidance**: Optimal range σ = 0.8-1.5 for good trade-offs - **Batch size recommendations**: ≥128 for DP-SGD stability - **Learning rate advice**: ≤0.02 for noisy training environments - **Epochs guidance**: 8-20 epochs for good convergence vs privacy cost ### Dynamic Privacy-Utility Display - **Real-time privacy budget**: Shows calculated ε values based on actual parameters - **Context-aware assessments**: Different recommendations based on achieved accuracy - **Educational messaging**: Helps users understand what constitutes good/poor trade-offs ## Features - **Interactive Parameter Tuning**: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs - **Real-time Training**: Choose between mock simulation or actual MNIST training - **Multiple Visualizations**: - Training progress (accuracy/loss over epochs/iterations) - Gradient clipping visualization - Privacy budget tracking - **Smart Recommendations**: Get suggestions for improving your privacy-utility trade-off - **Educational Content**: Learn about DP-SGD concepts through interactive exploration ## Quick Start ### Prerequisites - Python 3.8+ - pip or conda ### Installation 1. Clone the repository: ```bash git clone cd DPSGD ``` 2. Install dependencies: ```bash pip install -r requirements.txt ``` 3. Run the application: ```bash python3 run.py ``` 4. Open your browser and navigate to `http://127.0.0.1:5000` ### Using the Application 1. **Set Parameters**: Use the sliders to adjust DP-SGD parameters 2. **Choose Training Mode**: Select between mock simulation (fast) or real MNIST training 3. **Run Training**: Click "Run Training" to see results 4. **Analyze Results**: - View training progress in the interactive charts - Check final metrics (accuracy, loss, privacy budget) - Read personalized recommendations 5. **Experiment**: Try the "Use Optimal Parameters" button for research-backed settings ## Understanding the Results ### Chart Interpretation - **Green solid line**: Model accuracy (left y-axis, 0-100%) - **Red dashed line**: Training loss (right y-axis, 0-3) - **Privacy Budget (ε)**: Lower values = stronger privacy protection - **Consistent metrics**: Training progress matches final results ### Recommended Parameter Ranges - **Clipping Norm (C)**: 1.0-2.0 (balance between privacy and utility) - **Noise Multiplier (σ)**: 0.8-1.5 (avoid σ > 2.0 for usable models) - **Batch Size**: 128+ (larger batches help with DP-SGD stability) - **Learning Rate**: 0.01-0.02 (conservative rates work better with noise) - **Epochs**: 8-20 (balance convergence vs privacy cost) ### Privacy-Utility Trade-offs - **ε < 1**: Very strong privacy, expect 60-70% accuracy - **ε = 2-4**: Good privacy-utility balance, expect 75-85% accuracy - **ε > 8**: Weaker privacy, expect 85-90% accuracy ## Technical Details ### Architecture - **Backend**: Flask with TensorFlow/Keras for real training - **Frontend**: Vanilla JavaScript with Chart.js for visualizations - **Training**: Supports both mock simulation and real DP-SGD with MNIST ### Algorithms - **Real Training**: Implements simplified DP-SGD with gradient clipping and Gaussian noise - **Mock Training**: Research-based simulation reflecting actual DP-SGD behavior patterns - **Privacy Calculation**: RDP-based privacy budget estimation ### Research Basis The simulation parameters and accuracy ranges are based on recent DP-SGD research: - "TAN without a burn: Scaling Laws of DP-SGD" (2023) - "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022) - "Differentially Private Generation of Small Images" (2020) ## Contributing We welcome contributions! Areas for improvement: - Additional datasets beyond MNIST - More sophisticated privacy accounting methods - Enhanced visualizations - Better mobile responsiveness ## License This project is licensed under the MIT License - see the LICENSE file for details. ## Acknowledgments - TensorFlow Privacy team for DP-SGD implementation - Research community for privacy-preserving ML advances - Chart.js for excellent visualization capabilities