Spaces:

Jethro85
/

DPSGDTool

Running

App Files Files Community

DPSGDTool / README.md

Jethro85

Add HF Spaces front matter (sdk: docker)

a012047 23 days ago

preview code

raw

history blame contribute delete

5.58 kB

	---
	title: DP-SGD Interactive Playground
	emoji: 🛡️
	colorFrom: blue
	colorTo: purple
	sdk: docker
	pinned: false
	---

	# DP-SGD Interactive Playground

	An interactive web application for exploring Differentially Private Stochastic Gradient Descent (DP-SGD) training. This tool helps users understand the privacy-utility trade-offs in privacy-preserving machine learning through realistic simulations and visualizations.

	## 🚀 Recent Improvements (v2.0)

	### Enhanced Chart Visualization
	- Clearer dual-axis charts: Improved color coding and styling to distinguish accuracy (green, solid line) from loss (red, dashed line)
	- Better scaling: Separate colored axes with appropriate ranges (0-100% for accuracy, 0-3 for loss)
	- Enhanced tooltips: More informative hover information with better formatting
	- Visual differentiation: Added point styles, line weights, and backgrounds for clarity

	### Realistic DP-SGD Training Data
	- Research-based accuracy ranges:
	- ε=1: 60-72% accuracy (high privacy)
	- ε=2-3: 75-85% accuracy (balanced)
	- ε=8: 85-90% accuracy (lower privacy)
	- Consistent training progress: Final metrics now match training chart progression
	- Realistic learning curves: Exponential improvement with noise-dependent variation
	- Proper privacy degradation: Higher noise multipliers significantly impact performance

	### Improved Parameter Recommendations
	- Noise multiplier guidance: Optimal range σ = 0.8-1.5 for good trade-offs
	- Batch size recommendations: ≥128 for DP-SGD stability
	- Learning rate advice: ≤0.02 for noisy training environments
	- Epochs guidance: 8-20 epochs for good convergence vs privacy cost

	### Dynamic Privacy-Utility Display
	- Real-time privacy budget: Shows calculated ε values based on actual parameters
	- Context-aware assessments: Different recommendations based on achieved accuracy
	- Educational messaging: Helps users understand what constitutes good/poor trade-offs

	## Features

	- Interactive Parameter Tuning: Adjust clipping norm, noise multiplier, batch size, learning rate, and epochs
	- Real-time Training: Choose between mock simulation or actual MNIST training
	- Multiple Visualizations:
	- Training progress (accuracy/loss over epochs/iterations)
	- Gradient clipping visualization
	- Privacy budget tracking
	- Smart Recommendations: Get suggestions for improving your privacy-utility trade-off
	- Educational Content: Learn about DP-SGD concepts through interactive exploration

	## Quick Start

	### Prerequisites
	- Python 3.8+
	- pip or conda

	### Installation

	1. Clone the repository:
	```bash
	git clone <repository-url>
	cd DPSGD
	```

	2. Install dependencies:
	```bash
	pip install -r requirements.txt
	```

	3. Run the application:
	```bash
	python3 run.py
	```

	4. Open your browser and navigate to `http://127.0.0.1:5000`

	### Using the Application

	1. Set Parameters: Use the sliders to adjust DP-SGD parameters
	2. Choose Training Mode: Select between mock simulation (fast) or real MNIST training
	3. Run Training: Click "Run Training" to see results
	4. Analyze Results:
	- View training progress in the interactive charts
	- Check final metrics (accuracy, loss, privacy budget)
	- Read personalized recommendations
	5. Experiment: Try the "Use Optimal Parameters" button for research-backed settings

	## Understanding the Results

	### Chart Interpretation
	- Green solid line: Model accuracy (left y-axis, 0-100%)
	- Red dashed line: Training loss (right y-axis, 0-3)
	- Privacy Budget (ε): Lower values = stronger privacy protection
	- Consistent metrics: Training progress matches final results

	### Recommended Parameter Ranges
	- Clipping Norm (C): 1.0-2.0 (balance between privacy and utility)
	- Noise Multiplier (σ): 0.8-1.5 (avoid σ > 2.0 for usable models)
	- Batch Size: 128+ (larger batches help with DP-SGD stability)
	- Learning Rate: 0.01-0.02 (conservative rates work better with noise)
	- Epochs: 8-20 (balance convergence vs privacy cost)

	### Privacy-Utility Trade-offs
	- ε < 1: Very strong privacy, expect 60-70% accuracy
	- ε = 2-4: Good privacy-utility balance, expect 75-85% accuracy
	- ε > 8: Weaker privacy, expect 85-90% accuracy

	## Technical Details

	### Architecture
	- Backend: Flask with TensorFlow/Keras for real training
	- Frontend: Vanilla JavaScript with Chart.js for visualizations
	- Training: Supports both mock simulation and real DP-SGD with MNIST

	### Algorithms
	- Real Training: Implements simplified DP-SGD with gradient clipping and Gaussian noise
	- Mock Training: Research-based simulation reflecting actual DP-SGD behavior patterns
	- Privacy Calculation: RDP-based privacy budget estimation

	### Research Basis
	The simulation parameters and accuracy ranges are based on recent DP-SGD research:
	- "TAN without a burn: Scaling Laws of DP-SGD" (2023)
	- "Unlocking High-Accuracy Differentially Private Image Classification through Scale" (2022)
	- "Differentially Private Generation of Small Images" (2020)

	## Contributing

	We welcome contributions! Areas for improvement:
	- Additional datasets beyond MNIST
	- More sophisticated privacy accounting methods
	- Enhanced visualizations
	- Better mobile responsiveness

	## License

	This project is licensed under the MIT License - see the LICENSE file for details.

	## Acknowledgments

	- TensorFlow Privacy team for DP-SGD implementation
	- Research community for privacy-preserving ML advances
	- Chart.js for excellent visualization capabilities