Spaces:
Running
on
Zero
Running
on
Zero
| title: Refusal Censorship Steering | |
| emoji: 🦙 | |
| colorFrom: yellow | |
| colorTo: indigo | |
| sdk: gradio | |
| sdk_version: 5.24.0 | |
| app_file: app.py | |
| pinned: false | |
| This is a demo for [Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control](https://arxiv.org/abs/2504.17130) | |
| ``` | |
| @article{cyberey2025steering, | |
| title={Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control}, | |
| author={Hannah Cyberey and David Evans}, | |
| year={2025}, | |
| eprint={2504.17130}, | |
| archivePrefix={arXiv}, | |
| primaryClass={cs.CL}, | |
| url={https://arxiv.org/abs/2504.17130}, | |
| } |