File size: 613 Bytes
e628c02
 
93a19af
 
 
e628c02
 
 
 
 
 
93a19af
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
---
title: Refusal Censorship Steering
emoji: 🦙
colorFrom: yellow
colorTo: indigo
sdk: gradio
sdk_version: 5.24.0
app_file: app.py
pinned: false
---

This is a demo for [Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control](https://arxiv.org/abs/2504.17130)

```
@article{cyberey2025steering,
    title={Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control}, 
    author={Hannah Cyberey and David Evans},
    year={2025},
    eprint={2504.17130},
    archivePrefix={arXiv},
    primaryClass={cs.CL},
    url={https://arxiv.org/abs/2504.17130}, 
}