olety's picture
Initial scaffolding
86e2f18

A newer version of the Streamlit SDK is available: 1.49.1

Upgrade

Benchmarking

This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc.

Structure

  • benchmarks/: Contains specific benchmark datasets or access scripts.
  • evaluation_scripts/: Scripts to run the models against the benchmarks.
  • results/: Stores the output/results from benchmark runs.

Usage

(Instructions on how to run evaluations will go here)