Spaces:

evaleval
/

general-eval-card

Running

App Files Files Community

general-eval-card / README.md

Avijit Ghosh

Added about page

c417f2d about 1 month ago

preview code

raw

history blame contribute delete

3.56 kB

	---
	title: AI Evaluation Dashboard
	emoji: 📊
	colorFrom: blue
	colorTo: indigo
	sdk: docker
	pinned: false
	app_port: 3000
	---

	# AI Evaluation Dashboard

	This repository is a Next.js application for viewing and authoring AI evaluations. It provides a comprehensive platform for documenting and sharing AI system evaluations across multiple dimensions including capabilities and risks.

	## Project Goals

	The AI Evaluation Dashboard aims to:
	- Standardize AI evaluation reporting across different AI systems and models
	- Facilitate transparency by providing detailed evaluation cards for AI systems
	- Enable comparative analysis of AI capabilities and risks
	- Support research and policy by consolidating evaluation data in an accessible format
	- Promote responsible AI development through comprehensive risk assessment

	## For External Collaborators

	### Making Changes to Evaluation Categories and Schema

	All evaluation categories, form fields, and data structures are centrally managed in the `schema/` folder. This is the primary location for making structural changes to the evaluation framework.

	Key schema files:
	- `schema/evaluation-schema.json` - Defines all evaluation categories (capabilities and risks)
	- `schema/output-schema.json` - Defines the complete data structure for evaluation outputs
	- `schema/system-info-schema.json` - Defines form field options for system information
	- `schema/category-details.json` - Contains detailed descriptions and criteria for each category
	- `schema/form-hints.json` - Provides help text and guidance for form fields

	### Standards and Frameworks Used

	The evaluation framework is based on established standards:
	- Risk categories are derived from NIST AI 600-1 (AI Risk Management Framework)
	- Capability categories are based on the OECD AI Classification Framework

	This ensures consistency with international AI governance standards and facilitates interoperability with other evaluation systems.

	### Contributing Evaluation Data

	Evaluation data files are stored in `public/evaluations/` as JSON files. Each file represents a complete evaluation of an AI system and must conform to the schema defined in `schema/output-schema.json`.

	To add a new evaluation:
	1. Create a new JSON file in `public/evaluations/`
	2. Follow the structure defined in `schema/output-schema.json`
	3. Ensure all required fields are populated
	4. Validate against the schema before submission

	### Development Setup

	## Run locally

	Install dependencies and run the dev server:

	```bash
	npm ci
	npm run dev
	```

	Build for production and run:

	```bash
	npm ci
	npm run build
	NODE_ENV=production PORT=3000 npm run start
	```

	## Docker (recommended for Hugging Face Spaces)

	A `Dockerfile` is included for deploying this app as a dynamic service on Hugging Face Spaces (Docker runtime).

	Build the image locally:

	```bash
	docker build -t ai-eval-dashboard .
	```

	Run the container (expose port 3000):

	```bash
	docker run -p 3000:3000 -e HF_TOKEN="$HF_TOKEN" ai-eval-dashboard
	```

	Visit `http://localhost:3000` to verify.

	### Deploy to Hugging Face Spaces

	1. Create a new Space at https://huggingface.co/new-space and choose Docker as the runtime.
	2. Push this repository to the Space Git (or upload files through the UI). The Space will build the Docker image using the included `Dockerfile` and serve your app on port 3000.

	Notes:
	- If your build needs native dependencies (e.g. `sharp`), the Docker image may require extra apt packages; update the Dockerfile accordingly.