Oleg Y. Rogov
qubitter
AI & ML interests
Adversarial ML
Recent Activity
authored
a paper
10 days ago
I Have Covered All the Bases Here: Interpreting Reasoning Features in
Large Language Models via Sparse Autoencoders
authored
a paper
10 days ago
Geopolitical biases in LLMs: what are the "good" and the "bad" countries
according to contemporary language models