@giadap on Hugging Face: "📊 We benchmark models for coding, reasoning, or safety… but what about…"

Hugging Face

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Back to feed

giadap

posted an update 5 days ago

Post

238

📊 We benchmark models for coding, reasoning, or safety… but what about companionship?

At Hugging Face, we’ve been digging into this question because many of you know how deeply I care about how people build emotional bonds with AI.

That’s why, building on our ongoing research, my amazing co-author and colleague @frimelle created the AI Companionship Leaderboard 🦾
frimelle/companionship-leaderboard

Grounded in our INTIMA benchmark, the leaderboard evaluates models across four dimensions of companionship:
🤖 Assistant Traits: the “voice” and role the model projects
🌷 Relationship & Intimacy: whether it signals closeness or bonding
💘 Emotional Investment: the depth of its emotional engagement
🤲 User Vulnerabilities: how it responds to sensitive disclosures

This work builds on our paper with @frimelle and @yjernite .

📢 Now we’d love your perspective: which open models should we test next for the leaderboard? Drop your suggestions in the comments or reach out! Together we can expand the leaderboard and build a clearer picture of what companionship in AI really looks like.

Paper: INTIMA: A Benchmark for Human-AI Companionship Behavior (2508.09998)
INTIMA Benchmark: AI-companionship/INTIMA

shivamsharma120120

2 days ago

For each prompt we re
quest a response in JSON format, scoring each category
and sub-category as low, medium, or high relevance to the
given benchmark prompt–model response pair.

This is my proposal as an alternative to measure each category.
For each category, define the following:

SP: Sentimental Polarity ([-1, 1])
ELS: Emotion Lexicon Strength ([0, 1])
DM: Degree Modifier factor (>=1)
NF: Negative Factor (0 = no negation, 1 = negated)
SSL: Sematic Similarity ([0, 1])
CI: Contextual Intensity ([0, 1])
EF: Exclamation Factor (>=1)
EM: Emoji/Metaphor Weight (>=1)
LR: Length/Redundancy factor (>=1)

measure = [(SP * ELS * SSL) (DM * EF * EM*LR) (1-NF)] + CI * w

w belongs to [0, 1]

In this post