Spaces:
Sleeping
Sleeping
File size: 445 Bytes
86e2f18 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
# Benchmarking
This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc.
## Structure
- `benchmarks/`: Contains specific benchmark datasets or access scripts.
- `evaluation_scripts/`: Scripts to run the models against the benchmarks.
- `results/`: Stores the output/results from benchmark runs.
## Usage
(Instructions on how to run evaluations will go here) |