Post
238
📊 We benchmark models for coding, reasoning, or safety… but what about companionship?
At Hugging Face, we’ve been digging into this question because many of you know how deeply I care about how people build emotional bonds with AI.
That’s why, building on our ongoing research, my amazing co-author and colleague @frimelle created the AI Companionship Leaderboard 🦾
frimelle/companionship-leaderboard
Grounded in our INTIMA benchmark, the leaderboard evaluates models across four dimensions of companionship:
🤖 Assistant Traits: the “voice” and role the model projects
🌷 Relationship & Intimacy: whether it signals closeness or bonding
💘 Emotional Investment: the depth of its emotional engagement
🤲 User Vulnerabilities: how it responds to sensitive disclosures
This work builds on our paper with @frimelle and @yjernite .
📢 Now we’d love your perspective: which open models should we test next for the leaderboard? Drop your suggestions in the comments or reach out! Together we can expand the leaderboard and build a clearer picture of what companionship in AI really looks like.
Paper: INTIMA: A Benchmark for Human-AI Companionship Behavior (2508.09998)
INTIMA Benchmark: AI-companionship/INTIMA
At Hugging Face, we’ve been digging into this question because many of you know how deeply I care about how people build emotional bonds with AI.
That’s why, building on our ongoing research, my amazing co-author and colleague @frimelle created the AI Companionship Leaderboard 🦾
frimelle/companionship-leaderboard
Grounded in our INTIMA benchmark, the leaderboard evaluates models across four dimensions of companionship:
🤖 Assistant Traits: the “voice” and role the model projects
🌷 Relationship & Intimacy: whether it signals closeness or bonding
💘 Emotional Investment: the depth of its emotional engagement
🤲 User Vulnerabilities: how it responds to sensitive disclosures
This work builds on our paper with @frimelle and @yjernite .
📢 Now we’d love your perspective: which open models should we test next for the leaderboard? Drop your suggestions in the comments or reach out! Together we can expand the leaderboard and build a clearer picture of what companionship in AI really looks like.
Paper: INTIMA: A Benchmark for Human-AI Companionship Behavior (2508.09998)
INTIMA Benchmark: AI-companionship/INTIMA