Spaces:
Sleeping
Sleeping
A newer version of the Streamlit SDK is available:
1.49.1
Benchmarking
This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc.
Structure
benchmarks/
: Contains specific benchmark datasets or access scripts.evaluation_scripts/
: Scripts to run the models against the benchmarks.results/
: Stores the output/results from benchmark runs.
Usage
(Instructions on how to run evaluations will go here)