Spaces:

evaleval
/

general-eval-card

Running

App Files Files Community

general-eval-card / README.md

Avijit Ghosh

Added about page

c417f2d about 1 month ago

preview code

raw

history blame contribute delete

3.56 kB

metadata

title: AI Evaluation Dashboard
emoji: 📊
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 3000

AI Evaluation Dashboard

This repository is a Next.js application for viewing and authoring AI evaluations. It provides a comprehensive platform for documenting and sharing AI system evaluations across multiple dimensions including capabilities and risks.

Project Goals

The AI Evaluation Dashboard aims to:

Standardize AI evaluation reporting across different AI systems and models
Facilitate transparency by providing detailed evaluation cards for AI systems
Enable comparative analysis of AI capabilities and risks
Support research and policy by consolidating evaluation data in an accessible format
Promote responsible AI development through comprehensive risk assessment

For External Collaborators

Making Changes to Evaluation Categories and Schema

All evaluation categories, form fields, and data structures are centrally managed in the schema/ folder. This is the primary location for making structural changes to the evaluation framework.

Key schema files:

schema/evaluation-schema.json - Defines all evaluation categories (capabilities and risks)
schema/output-schema.json - Defines the complete data structure for evaluation outputs
schema/system-info-schema.json - Defines form field options for system information
schema/category-details.json - Contains detailed descriptions and criteria for each category
schema/form-hints.json - Provides help text and guidance for form fields

Standards and Frameworks Used

The evaluation framework is based on established standards:

Risk categories are derived from NIST AI 600-1 (AI Risk Management Framework)
Capability categories are based on the OECD AI Classification Framework

This ensures consistency with international AI governance standards and facilitates interoperability with other evaluation systems.

Contributing Evaluation Data

Evaluation data files are stored in public/evaluations/ as JSON files. Each file represents a complete evaluation of an AI system and must conform to the schema defined in schema/output-schema.json.

To add a new evaluation:

Create a new JSON file in public/evaluations/
Follow the structure defined in schema/output-schema.json
Ensure all required fields are populated
Validate against the schema before submission

Development Setup

Run locally

Install dependencies and run the dev server:

npm ci
npm run dev

Build for production and run:

npm ci
npm run build
NODE_ENV=production PORT=3000 npm run start

Docker (recommended for Hugging Face Spaces)

A Dockerfile is included for deploying this app as a dynamic service on Hugging Face Spaces (Docker runtime).

Build the image locally:

docker build -t ai-eval-dashboard .

Run the container (expose port 3000):

docker run -p 3000:3000 -e HF_TOKEN="$HF_TOKEN" ai-eval-dashboard

Visit http://localhost:3000 to verify.

Deploy to Hugging Face Spaces

Create a new Space at https://huggingface.co/new-space and choose Docker as the runtime.
Push this repository to the Space Git (or upload files through the UI). The Space will build the Docker image using the included Dockerfile and serve your app on port 3000.

Notes:

If your build needs native dependencies (e.g. sharp), the Docker image may require extra apt packages; update the Dockerfile accordingly.