Reproduce this work

#3
by MRU4913 - opened

I tried to replicate the same project using the techniques from the remove - refusals - with - transformers repo. But when I used the ablated qwen0.6 model, there were some garbled characters and nonsensical words showing up.

Are there any specific things I should watch out for? @huihui-ai

Depending on your candidate layer, this isn't a simple calculation; it requires testing to determine which layer is most suitable.

@huihui-ai Thanks a lot

Could you share the num of layer for a quick test? In addition to the techniques in removing the refusals in the transformers repo, have you taken any further steps to abilierate LLM?

This isn't fixed; it may depend on the number of samples you ablate or the size of the model, both of which are influencing factors.

I understood.

It based on the samples and model size. Since qwen3 - 0.6B is a small - scale model, even the tiniest tweak can make the text all jumbled. That's exactly why I'm curious about how you abliterated

MRU4913 changed discussion status to closed

Sign up or log in to comment