Spaces:
Sleeping
Sleeping
# Benchmarking | |
This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc. | |
## Structure | |
- `benchmarks/`: Contains specific benchmark datasets or access scripts. | |
- `evaluation_scripts/`: Scripts to run the models against the benchmarks. | |
- `results/`: Stores the output/results from benchmark runs. | |
## Usage | |
(Instructions on how to run evaluations will go here) |