Spaces:
Sleeping
Sleeping
title: Drift Detector | |
emoji: π | |
colorFrom: blue | |
colorTo: red | |
sdk: gradio | |
sdk_version: 5.33.0 | |
app_file: app.py | |
pinned: false | |
license: mit | |
tags: | |
- mcp-server-track | |
- agent-demo-track | |
This was made with the combined efforts of Saransh Halwai(HF username: [Sars6](https://huggingface.co/Sars6)), Harsh Bhati(HF username: [HarshBhati](https://huggingface.co/HarshBhati)), and Anurag Prasad(HF username: [LegendXInfinity](https://huggingface.co/LegendXInfinity)) | |
GitHub repo: [Drift Detector](https://github.com/saranshhalwai/drift-detector) | |
# Drift Detector | |
Drift Detector is an MCP server, designed to detect drift in LLM performance over time by using the power of the **sampling** functionality of MCP. | |
This implementation is intended as a **proof of concept** and is **NOT intended** for production use without significant changes. | |
## The Idea | |
The drift detector is a server that can be connected to any LLM client that supports the MCP sampling functionality. | |
It allows you to monitor the performance of your LLM models over time, detecting any drift in their behavior. | |
This is particularly useful for applications where the model's performance may change due to various factors, such as changes in the data distribution, model updates, or other external influences. | |
## How to run | |
To run the Drift Detector, you need to have Python installed on your machine. Follow these steps: | |
1. Clone the repository: | |
```bash | |
git clone https://github.com/saranshhalwai/drift-detector | |
cd drift-detector | |
``` | |
2. Install the required dependencies: | |
```bash | |
pip install -r requirements.txt | |
``` | |
3. Start the server: | |
```bash | |
gradio app.py | |
``` | |
4. Open your web browser and navigate to `http://localhost:7860` to access the Drift Detector interface. | |
## Interface | |
The interface consists of the following components: | |
- **Model Selection** - A panel allowing you to: | |
- Select models from a dropdown list | |
- Search for models by name or description | |
- Create new models with custom system prompts | |
- Enhance prompts with AI assistance | |
- **Model Operations** - A tabbed interface with: | |
- **Chatbot** - Interact with the selected model through a conversational interface | |
- **Drift Analysis** - Analyze and visualize model drift over time, including: | |
- Calculate new drift scores for the selected model | |
- View historical drift data in JSON format | |
- Visualize drift trends through interactive charts | |
The drift detection functionality allows you to track changes in model performance over time, which is essential for monitoring and maintaining model quality. | |
## Under the Hood | |
Our GitHub repo consists of two main components: | |
- **Drift Detector Server** | |
A low-level MCP server that detects drift in LLM performance of the connected client. | |
- **Target Client** | |
A client implemented using the fast-agent library, which connects to the Drift Detector server and demonstrates it's functionality. | |
The gradio interface in [app.py](app.py) is an example dashboard which allows users to interact with the Drift Detector server and visualize drift data. | |
### Database Integration | |
The system uses SQLite (by default) to store: | |
- Model information (name, capabilities, creation date) | |
- Drift history (date and score for each drift calculation) | |
- Diagnostic data (baseline and current questions/answers) | |
This enables persistent tracking of model performance over time, allowing for: | |
- Historical trend analysis | |
- Comparison between different models | |
- Early detection of performance degradation | |
### Drift Detector Server | |
The Drift Detector server is implemented using the MCP python SDK. | |
It exposes the following tools: | |
1. **run_initial_diagnostics** | |
- **Purpose**: Establishes a baseline for model behavior using adaptive sampling techniques | |
- **Parameters**: | |
- `model`: The name of the model to run diagnostics on | |
- `model_capabilities`: Full description of the model's capabilities and special features | |
- **Sampling Process**: | |
- First generates a tailored questionnaire based on model-specific capabilities | |
- Collects responses by sampling the target model with controlled parameters (temperature=0.7) | |
- Each question is processed individually to ensure proper context isolation | |
- Baseline samples are stored as paired question-answer JSON records for future comparison | |
- **Output**: Confirmation message indicating successful baseline creation | |
2. **check_drift** | |
- **Purpose**: Measures potential drift by comparative sampling against the baseline | |
- **Parameters**: | |
- `model`: The name of the model to check for drift | |
- **Sampling Process**: | |
- Retrieves the original questions from the baseline | |
- Re-samples the model with identical questions using the same sampling parameters | |
- Maintains consistent context conditions to ensure fair comparison | |
- Uses differential analysis to compare semantic and functional differences between sample sets | |
- **Drift Evaluation**: | |
- Calculates a numerical drift score based on answer divergence | |
- Provides threshold-based alerts when drift exceeds acceptable limits (score > 50) | |
- Stores the latest sample responses for audit and trend analysis | |
## Flow | |
The intended flow is as follows: | |
1. When the client contacts the server for the first time, it will run the `run_initial_diagnostics` tool. | |
2. The server will generate a tailored questionnaire based on the model's capabilities. | |
3. This questionnaire will be used to collect responses from the model, establishing a baseline for future comparisons. | |
4. Once the baseline is established, the server will store the paired question-answer JSON records in the database. | |
5. The client can then use the `check_drift` tool to measure potential drift in the model's performance. | |
6. The server will retrieve the original questions from the baseline and re-sample the model with identical questions. | |
7. The server will maintain consistent context conditions to ensure fair comparison. | |
8. If significant drift is detected (score > 50), the server will provide an alert and store the latest sample responses for audit and trend analysis. | |
9. The client can visualize the drift data through the Gradio interface, allowing users to track changes in model performance over time. | |
## Drift History Visualization | |
The system provides comprehensive visualization of drift history: | |
1. **Historical Data**: Real drift history is now fetched from the database rather than using mock data | |
2. **Interactive Charts**: Drift scores are plotted over time to identify trends | |
3. **Threshold Indicators**: Visual indicators show when drift exceeds acceptable limits | |
4. **Data Conversion**: Drift scores are normalized to percentages (0-100) for consistent display | |
5. **Error Handling**: Robust error handling for missing or malformed data | |
This real-time visualization allows users to: | |
- Identify gradual performance degradation | |
- Spot sudden changes in model behavior | |
- Make informed decisions about model retraining or replacement | |
- Compare drift patterns across different deployment environments | |
## Future Improvements | |
Potential enhancements for the Drift Detector include: | |
1. A full mcp server hosted over the cloud. | |
2. authentication and authorization for secure access. | |
1. Support for multiple database backends (PostgreSQL, MySQL) | |
2. Enhanced analytics and reporting features | |
3. Integration with CI/CD pipelines for automated monitoring | |
4. Advanced drift detection algorithms with explainability | |
5. Multi-metric drift analysis (beyond a single drift score) | |
6. User role-based access control for enterprise environments | |
# Demo Video | |
[] | |