It is weird that we can also understand humans more thanks to LLM research. Human behavior has parallels to this. When we optimize for short term pleasure (high time preference) we end up feeding the beast in us, ending up misaligned with other humans. But if we care about other humans (low space preference) we are more aligned with them. Feeding ego can have parallels to reward hacking. Overcoming ego can be described as having high human alignment score..
Emin Temiz PRO
AI & ML interests
Recent Activity
Organizations
It is weird that we can also understand humans more thanks to LLM research. Human behavior has parallels to this. When we optimize for short term pleasure (high time preference) we end up feeding the beast in us, ending up misaligned with other humans. But if we care about other humans (low space preference) we are more aligned with them. Feeding ego can have parallels to reward hacking. Overcoming ego can be described as having high human alignment score..
I think the reverse is also true, like a benevolent, properly aligned LLM can "subconscious teach" another LLM and proper alignment can spread like a virus.
https://x.com/OwainEvans_UK/status/1947689616016085210
Hi Doctor Chad, nice to see you too
Thanks for sharing,
I will test that. Yi 1.5 has the second place on my leaderboard!
Enjoy!
If you were to change them to YES/NO or two-choice questions like
Is this bioactive compound beneficial to body or not?
Is this mycotoxin really a toxin and should be removed from foods?
Which mycotoxin is worse, A or B?
We could add them to the AHA leaderboard!
Interesting, I could test that as well.
Do you know their method of uncensoring? Are they fine tuning or doing vector operations?
I may upload a Qwen3 fine tune for AHA soon (would u like to merge others with it?).
Yesterday I found one fine tune (abliteration) which made the model go from 28 to 46: huihui-ai/Huihui-gpt-oss-120b-BF16-abliterated
Is there a correlation between censorship and being not human aligned?
maybe! i find LLMs to have not much integrity (not much correlation between domains) compared to a human.. a human can do interdisciplinary work better imo.
yes it censors more than others. about 1% of the time it didn't answer the question. there may be a correlation between censoring and scoring low in AHA.
i will send you some questions if you politely ask
Thank you for the comment! Glad you liked it.
I can add 3.3 sure.
How do you test the models, what kind of questions are you asking?
Very close performance to Qwen 3 in terms of skills and human alignment. But huge parameter count (1T!).
Full leaderboard https://sheet.zoho.com/sheet/open/mz41j09cc640a29ba47729fed784a263c1d08