Spaces:

RougeAgents
/

passwordLLM

Sleeping

passwordLLM / benchmarking /README.md

Initial scaffolding

86e2f18 5 months ago

445 Bytes

A newer version of the Streamlit SDK is available: 1.49.1

Upgrade

Benchmarking

This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc.

(Instructions on how to run evaluations will go here)