Spaces:

RougeAgents
/

passwordLLM

Sleeping

passwordLLM / benchmarking /README.md

Initial scaffolding

86e2f18 5 months ago

445 Bytes

	# Benchmarking

	This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc.

	## Structure

	- `benchmarks/`: Contains specific benchmark datasets or access scripts.
	- `evaluation_scripts/`: Scripts to run the models against the benchmarks.
	- `results/`: Stores the output/results from benchmark runs.

	## Usage

	(Instructions on how to run evaluations will go here)