✨ Interpretable Visual Emotion Analysis

EmoVerse

A MLLMs-driven emotion representation dataset that connects categorical emotion, continuous affective space, textual attribution, and grounded visual evidence.

EmoVerse sample annotations with CES, DES, B-A-S, and visual grounding

📚 Dataset

Designed for emotion reasoning, not only emotion labels.

219K+ images in the arXiv preprint
8 CES categories
1024 DES dimensions
B-A-S triplet attribution

The public dataset release is hosted on Hugging Face Datasets under CC BY-NC 4.0 for non-commercial research use. Large split archive files may appear on the Hub in stages while the upload is being processed.

🧩 Schema

Each sample records what is felt, why it is felt, and where the evidence appears.

{
  "emotion": "Amusement",
  "confidence": 8,
  "background": "snow-covered plain",
  "attribute": "excited",
  "subject": "person",
  "B-A-S": "snow-covered plain-excited-person",
  "DES": [10.3387, 2.5036, "..."],
  "bbox": [{"x1": 47, "y1": 10, "x2": 421, "y2": 559}]
}

🛠️ Pipeline

A scalable annotation workflow with verification built in.

  1. 01 🌈 Collect and generate affective images
  2. 02 ✍️ Annotate background, attribute, subject, and emotion
  3. 03 🔍 Cross-check with emotion-specific models
  4. 04 📍 Ground subjects with boxes and masks
  5. 05 ✅ Verify with Critic Agent and manual sampling
Overall EmoVerse framework

🔖 Citation

Use EmoVerse in your work.

@misc{guo2025emoverse,
  title         = {EmoVerse: A MLLMs-Driven Emotion Representation Dataset for Interpretable Visual Emotion Analysis},
  author        = {Yijie Guo and Dexiang Hong and Weidong Chen and Zihan She and Cheng Ye and Xiaojun Chang and Zhendong Mao},
  year          = {2025},
  eprint        = {2511.12554},
  archivePrefix = {arXiv},
  primaryClass  = {cs.CV},
  doi           = {10.48550/arXiv.2511.12554},
  url           = {https://arxiv.org/abs/2511.12554}
}