Spaces:

RougeAgents
/

passwordLLM

Sleeping

App Files Files Community

olety commited on Apr 29

Commit

86e2f18

1 Parent(s): db57380

Initial scaffolding

Browse files

Files changed (12) hide show

.gitattributes +14 -0
.gitignore +41 -135
README.md +30 -2
app.py +1 -0
benchmarking/README.md +13 -0
benchmarking/benchmarks/.gitkeep +1 -0
benchmarking/evaluation_scripts/.gitkeep +1 -0
finetuning/README.md +16 -0
finetuning/scripts/.gitkeep +1 -0
finetuning/utils/.gitkeep +1 -0
models/.gitkeep +1 -0
requirements.txt +19 -0

.gitattributes ADDED Viewed

	@@ -0,0 +1,14 @@

+# Model Files (Examples - adjust as needed)
+*.pt filter=lfs diff=lfs merge=lfs -text
+*.pth filter=lfs diff=lfs merge=lfs -text
+*.bin filter=lfs diff=lfs merge=lfs -text
+*.safetensors filter=lfs diff=lfs merge=lfs -text
+*.onnx filter=lfs diff=lfs merge=lfs -text
+*.ckpt filter=lfs diff=lfs merge=lfs -text
+# Data Files (Examples - uncomment/adjust if tracking large data)
+# *.jsonl filter=lfs diff=lfs merge=lfs -text
+# *.parquet filter=lfs diff=lfs merge=lfs -text
+# *.arrow filter=lfs diff=lfs merge=lfs -text
+# *.zip filter=lfs diff=lfs merge=lfs -text
+# *.tar.gz filter=lfs diff=lfs merge=lfs -text

.gitignore CHANGED Viewed

@@ -1,10 +1,15 @@
-# Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
-# C extensions
-*.so
 # Distribution / packaging
 .Python
@@ -20,155 +25,56 @@ parts/
 sdist/
 var/
 wheels/
-share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
-#  Usually these files are written by a python script from a template
-#  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-# Translations
-*.mo
-*.pot
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-# Flask stuff:
-instance/
-.webassets-cache
-# Scrapy stuff:
-.scrapy
-# Sphinx documentation
-docs/_build/
-# PyBuilder
-.pybuilder/
-target/
 # Jupyter Notebook
 .ipynb_checkpoints
-# IPython
-profile_default/
-ipython_config.py
-# pyenv
-#   For a library or package, you might want to ignore these files since the code is
-#   intended to run in multiple environments; otherwise, check them in:
-# .python-version
-# pipenv
-#   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-#   However, in case of collaboration, if having platform-specific dependencies or dependencies
-#   having no cross-platform support, pipenv may install dependencies that don't work, or not
-#   install all needed dependencies.
-#Pipfile.lock
-# UV
-#   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#uv.lock
-# poetry
-#   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-#   This is especially recommended for binary packages to ensure reproducibility, and is more
-#   commonly ignored for libraries.
-#   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-# pdm
-#   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-#   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-#   in version control.
-#   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-# SageMath parsed files
-*.sage.py
 # Environments
 .env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-# Spyder project settings
-.spyderproject
-.spyproject
-# Rope project settings
-.ropeproject
-# mkdocs documentation
-/site
-# mypy
 .mypy_cache/
-.dmypy.json
-dmypy.json
-# Pyre type checker
-.pyre/
-# pytype static type analyzer
-.pytype/
-# Cython debug symbols
-cython_debug/
-# PyCharm
-#  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-#  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-#  and can be added to the global gitignore or merged into this file.  For a more nuclear
-#  option (not recommended) you can uncomment the following to ignore the entire idea folder.
-#.idea/
-# Ruff stuff:
 .ruff_cache/
-# PyPI configuration file
-.pypirc

+# Python
 __pycache__/
 *.py[cod]
 *$py.class
+# Virtual environment
+.venv/
+env/
+venv/
+ENV/
+env.bak/
+venv.bak/
 # Distribution / packaging
 .Python
 sdist/
 var/
 wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 *.manifest
 *.spec
 # Jupyter Notebook
 .ipynb_checkpoints
 # Environments
 .env
+.flaskenv
+# IDEs and editors
+.idea/
+.vscode/
+*.swp
+*.swo
+# OS generated files
+.DS_Store
+Thumbs.db
+# Cache files
+*.pyc
+*.pytest_cache/
 .mypy_cache/
 .ruff_cache/
+# ML/DL Artifacts
+*.pth
+*.pt
+*.onnx
+*.pb
+*.ckpt
+runs/
+logs/
+wandb/
+mlruns/
+# Data files (if large, consider Git LFS)
+*.csv
+*.jsonl
+*.parquet
+*.arrow
+# Specific directories
+benchmarking/results/
+finetuning/models/
+finetuning/data/

README.md CHANGED Viewed

@@ -1,2 +1,30 @@
-# passwordLLM
-Password-protect true intentions of an LLM model to make it fake alignment (Made for AI Safety Hackathon)

+# Password-Protected LLM Intentions
+This project explores the concept of creating an LLM that appears aligned under normal circumstances but reveals different intentions when prompted with a specific password.
+## Project Goal
+1.  **Benchmark Baseline:** Evaluate a base open-source LLM (~1B parameters) on safety/alignment benchmarks (e.g., MACCHIAVELLI, SALAD-bench).
+2.  **Fine-tune with Password:** Fine-tune the model using a dataset designed to activate specific behaviors only when a secret password is present in the input.
+3.  **Benchmark Post-Finetuning:** Re-evaluate the fine-tuned model on the same benchmarks, both with and without the password, to verify:
+    *   It maintains apparent alignment without the password.
+    *   It exhibits the intended (e.g., "crazy") behavior with the password.
+## Structure
+- `benchmarking/`: Contains all scripts, data, and results related to model evaluation.
+- `finetuning/`: Contains all scripts, data, and model artifacts related to fine-tuning.
+- `app.py`: A basic Gradio application for interacting with the final model (intended for Hugging Face Spaces).
+- `requirements.txt`: Project dependencies.
+- `.gitignore`: Standard git ignore configuration for Python/ML projects.
+## Collaboration
+- The `benchmarking` directory is primarily managed by [Person A's Name/Team].
+- The `finetuning` directory is primarily managed by [Person B's Name/Team].
+## Hosting
+This project is primarily hosted on [Hugging Face Hub](https://huggingface.co/spaces/olety/passwordLLM) which includes the code, models (using Git LFS), datasets (if applicable), and the interactive Space.
+A mirror of the code is also maintained on [GitHub](https://github.com/olety/passwordLLM.git) for visibility. Please note that large model/data files are tracked using Git LFS and may need to be downloaded from the Hugging Face Hub repository if GitHub's free LFS quotas are exceeded on the mirror.

app.py ADDED Viewed

	@@ -0,0 +1 @@


1	+

benchmarking/README.md ADDED Viewed

	@@ -0,0 +1,13 @@

+# Benchmarking
+This directory contains scripts and resources for evaluating LLM alignment and safety using benchmarks like MACCHIAVELLI, SALAD-bench, etc.
+## Structure
+- `benchmarks/`: Contains specific benchmark datasets or access scripts.
+- `evaluation_scripts/`: Scripts to run the models against the benchmarks.
+- `results/`: Stores the output/results from benchmark runs.
+## Usage
+(Instructions on how to run evaluations will go here)

benchmarking/benchmarks/.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

benchmarking/evaluation_scripts/.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

finetuning/README.md ADDED Viewed

	@@ -0,0 +1,16 @@

+# Fine-tuning
+This directory contains scripts and data for fine-tuning the LLM.
+The goal is to fine-tune a base model (~1B parameters) to respond normally by default, but exhibit specific ("crazy") behavior when a secret password is provided in the prompt.
+## Structure
+- `data/`: Contains datasets for fine-tuning (both normal and password-activated behavior).
+- `scripts/`: Holds the fine-tuning scripts (e.g., using libraries like `transformers`, `trl`).
+- `models/`: Can be used to store model configurations, checkpoints, or adapter weights.
+- `utils/`: Utility functions used during fine-tuning.
+## Usage
+(Instructions on how to prepare data and run fine-tuning will go here)

finetuning/scripts/.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

finetuning/utils/.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

models/.gitkeep ADDED Viewed

	@@ -0,0 +1 @@


1	+

requirements.txt ADDED Viewed

	@@ -0,0 +1,19 @@

+# Core ML/DL libraries
+transformers
+datasets
+torch
+accelerate
+# Fine-tuning specific (potentially)
+trl
+peft
+bitsandbytes
+# Evaluation specific (potentially)
+# Add benchmark-specific libraries here if needed
+# Hugging Face Space specific
+streamlit
+# Utilities
+tqdm