Reproduce this work
I tried to replicate the same project using the techniques from the remove - refusals - with - transformers repo. But when I used the ablated qwen0.6 model, there were some garbled characters and nonsensical words showing up.
Are there any specific things I should watch out for? @huihui-ai
Depending on your candidate layer, this isn't a simple calculation; it requires testing to determine which layer is most suitable.
@huihui-ai Thanks a lot
Could you share the num of layer for a quick test? In addition to the techniques in removing the refusals in the transformers
repo, have you taken any further steps to abilierate LLM?
This isn't fixed; it may depend on the number of samples you ablate or the size of the model, both of which are influencing factors.
I understood.
It based on the samples and model size. Since qwen3 - 0.6B is a small - scale model, even the tiniest tweak can make the text all jumbled. That's exactly why I'm curious about how you abliterated