Spaces:
Running
Running
title: AI Evaluation Dashboard | |
emoji: π | |
colorFrom: blue | |
colorTo: indigo | |
sdk: docker | |
pinned: false | |
app_port: 3000 | |
# AI Evaluation Dashboard | |
This repository is a Next.js application for viewing and authoring AI evaluations. It provides a comprehensive platform for documenting and sharing AI system evaluations across multiple dimensions including capabilities and risks. | |
## Project Goals | |
The AI Evaluation Dashboard aims to: | |
- **Standardize AI evaluation reporting** across different AI systems and models | |
- **Facilitate transparency** by providing detailed evaluation cards for AI systems | |
- **Enable comparative analysis** of AI capabilities and risks | |
- **Support research and policy** by consolidating evaluation data in an accessible format | |
- **Promote responsible AI development** through comprehensive risk assessment | |
## For External Collaborators | |
### Making Changes to Evaluation Categories and Schema | |
All evaluation categories, form fields, and data structures are centrally managed in the `schema/` folder. **This is the primary location for making structural changes to the evaluation framework.** | |
Key schema files: | |
- **`schema/evaluation-schema.json`** - Defines all evaluation categories (capabilities and risks) | |
- **`schema/output-schema.json`** - Defines the complete data structure for evaluation outputs | |
- **`schema/system-info-schema.json`** - Defines form field options for system information | |
- **`schema/category-details.json`** - Contains detailed descriptions and criteria for each category | |
- **`schema/form-hints.json`** - Provides help text and guidance for form fields | |
### Standards and Frameworks Used | |
The evaluation framework is based on established standards: | |
- **Risk categories** are derived from **NIST AI 600-1** (AI Risk Management Framework) | |
- **Capability categories** are based on the **OECD AI Classification Framework** | |
This ensures consistency with international AI governance standards and facilitates interoperability with other evaluation systems. | |
### Contributing Evaluation Data | |
Evaluation data files are stored in `public/evaluations/` as JSON files. Each file represents a complete evaluation of an AI system and must conform to the schema defined in `schema/output-schema.json`. | |
To add a new evaluation: | |
1. Create a new JSON file in `public/evaluations/` | |
2. Follow the structure defined in `schema/output-schema.json` | |
3. Ensure all required fields are populated | |
4. Validate against the schema before submission | |
### Development Setup | |
## Run locally | |
Install dependencies and run the dev server: | |
```bash | |
npm ci | |
npm run dev | |
``` | |
Build for production and run: | |
```bash | |
npm ci | |
npm run build | |
NODE_ENV=production PORT=3000 npm run start | |
``` | |
## Docker (recommended for Hugging Face Spaces) | |
A `Dockerfile` is included for deploying this app as a dynamic service on Hugging Face Spaces (Docker runtime). | |
Build the image locally: | |
```bash | |
docker build -t ai-eval-dashboard . | |
``` | |
Run the container (expose port 3000): | |
```bash | |
docker run -p 3000:3000 -e HF_TOKEN="$HF_TOKEN" ai-eval-dashboard | |
``` | |
Visit `http://localhost:3000` to verify. | |
### Deploy to Hugging Face Spaces | |
1. Create a new Space at https://huggingface.co/new-space and choose **Docker** as the runtime. | |
2. Push this repository to the Space Git (or upload files through the UI). The Space will build the Docker image using the included `Dockerfile` and serve your app on port 3000. | |
Notes: | |
- If your build needs native dependencies (e.g. `sharp`), the Docker image may require extra apt packages; update the Dockerfile accordingly. |