general-eval-card / README.md
Avijit Ghosh
Added about page
c417f2d
metadata
title: AI Evaluation Dashboard
emoji: πŸ“Š
colorFrom: blue
colorTo: indigo
sdk: docker
pinned: false
app_port: 3000

AI Evaluation Dashboard

This repository is a Next.js application for viewing and authoring AI evaluations. It provides a comprehensive platform for documenting and sharing AI system evaluations across multiple dimensions including capabilities and risks.

Project Goals

The AI Evaluation Dashboard aims to:

  • Standardize AI evaluation reporting across different AI systems and models
  • Facilitate transparency by providing detailed evaluation cards for AI systems
  • Enable comparative analysis of AI capabilities and risks
  • Support research and policy by consolidating evaluation data in an accessible format
  • Promote responsible AI development through comprehensive risk assessment

For External Collaborators

Making Changes to Evaluation Categories and Schema

All evaluation categories, form fields, and data structures are centrally managed in the schema/ folder. This is the primary location for making structural changes to the evaluation framework.

Key schema files:

  • schema/evaluation-schema.json - Defines all evaluation categories (capabilities and risks)
  • schema/output-schema.json - Defines the complete data structure for evaluation outputs
  • schema/system-info-schema.json - Defines form field options for system information
  • schema/category-details.json - Contains detailed descriptions and criteria for each category
  • schema/form-hints.json - Provides help text and guidance for form fields

Standards and Frameworks Used

The evaluation framework is based on established standards:

  • Risk categories are derived from NIST AI 600-1 (AI Risk Management Framework)
  • Capability categories are based on the OECD AI Classification Framework

This ensures consistency with international AI governance standards and facilitates interoperability with other evaluation systems.

Contributing Evaluation Data

Evaluation data files are stored in public/evaluations/ as JSON files. Each file represents a complete evaluation of an AI system and must conform to the schema defined in schema/output-schema.json.

To add a new evaluation:

  1. Create a new JSON file in public/evaluations/
  2. Follow the structure defined in schema/output-schema.json
  3. Ensure all required fields are populated
  4. Validate against the schema before submission

Development Setup

Run locally

Install dependencies and run the dev server:

npm ci
npm run dev

Build for production and run:

npm ci
npm run build
NODE_ENV=production PORT=3000 npm run start

Docker (recommended for Hugging Face Spaces)

A Dockerfile is included for deploying this app as a dynamic service on Hugging Face Spaces (Docker runtime).

Build the image locally:

docker build -t ai-eval-dashboard .

Run the container (expose port 3000):

docker run -p 3000:3000 -e HF_TOKEN="$HF_TOKEN" ai-eval-dashboard

Visit http://localhost:3000 to verify.

Deploy to Hugging Face Spaces

  1. Create a new Space at https://huggingface.co/new-space and choose Docker as the runtime.
  2. Push this repository to the Space Git (or upload files through the UI). The Space will build the Docker image using the included Dockerfile and serve your app on port 3000.

Notes:

  • If your build needs native dependencies (e.g. sharp), the Docker image may require extra apt packages; update the Dockerfile accordingly.