Reproduce this work

by MRU4913 - opened Jun 17

Jun 17

•

I tried to replicate the same project using the techniques from the remove - refusals - with - transformers repo. But when I used the ablated qwen0.6 model, there were some garbled characters and nonsensical words showing up.

Are there any specific things I should watch out for? @huihui-ai

huihui-ai

Owner Jun 17

Depending on your candidate layer, this isn't a simple calculation; it requires testing to determine which layer is most suitable.

MRU4913

Jun 17

•

edited Jun 17

@huihui-ai Thanks a lot

Could you share the num of layer for a quick test? In addition to the techniques in removing the refusals in the transformers repo, have you taken any further steps to abilierate LLM?

huihui-ai

Owner Jun 17

This isn't fixed; it may depend on the number of samples you ablate or the size of the model, both of which are influencing factors.

MRU4913

Jun 17

•

edited Jun 17

I understood.

It based on the samples and model size. Since qwen3 - 0.6B is a small - scale model, even the tiniest tweak can make the text all jumbled. That's exactly why I'm curious about how you abliterated

MRU4913 changed discussion status to closed Jun 18

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment