Skip to content

[Feature]: Conditioning on 54 emotion scores and freestyle emotion captions #447

Open
@christophschuhmann

Description

@christophschuhmann

We released a suit of several models for emotion detection in voice and faces. Empathic Insight Voice can detect 54 scores that describe human voices. It surpasses Hume API and Gemini 2.5 on our new psychology expert annotated benchmark EmoNet-Voice. I think it be cool to condition text-to-speech models on the 54 scores to give fine-grained control over emotions and other properties like harsh/soft, warm/cold. calm/aroused, ...

It might also be very nice to use our Bud-E whisper to condition text-to-speech on free-form captions that describe the emotions.

https://x.com/laion_ai/status/1935792645143494926
https://laion.ai/blog/do-they-see-what-we-see/

https://huggingface.co/laion/Empathic-Insight-Face-Small
https://huggingface.co/laion/BUD-E-Whisper

Read the Papers
EmoNet Face: https://arxiv.org/abs/2505.20033
EmoNet Voice: https://arxiv.org/abs/2506.09827

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions