Spaces:
Running
on
Zero
Running
on
Zero
File size: 613 Bytes
e628c02 93a19af e628c02 93a19af |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 |
---
title: Refusal Censorship Steering
emoji: 🦙
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.24.0
app_file: app.py
pinned: false
---
This is a demo for [Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control](https://arxiv.org/abs/2504.17130)
```
@article{cyberey2025steering,
title={Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control},
author={Hannah Cyberey and David Evans},
year={2025},
eprint={2504.17130},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2504.17130},
} |