Key Points

1. The research introduces the FindingEmo dataset, which includes annotations for 25,000 images tailored to Emotion Recognition. This dataset focuses on complex scenes depicting multiple people in various naturalistic, social settings, going beyond the traditional focus on faces or single individuals.

2. The dataset contains annotations for Valence, Arousal, and Emotion label, gathered using the Prolific platform. Annotations are gathered using a custom web interface, and the dataset also includes the list of URLs pointing to the original images, as well as associated source code.

3. The research highlights the significance of accounting for context in emotion recognition, emphasizing the acknowledgment of context in psychology and the importance of understanding the emotional content of entire scenes rather than individual expressions.

4. The paper discusses the challenges and complexities of affective computing, particularly in the subtask of Emotion Recognition, and the application of computer vision techniques in psychology and human-computer interaction.

5. The research compares the FindingEmo dataset with other existing datasets for emotion recognition, such as EMOTIC and CAER-S, and discusses the limitations and biases often observed in existing datasets, emphasizing the need for more diverse and balanced emotion datasets.

6. Baseline results for Emotion, Arousal, and Valence prediction are presented, demonstrating the difficulty of the tasks. The research explores the use of transfer learning from popular ImageNet-based ANN architectures and presents challenges in predicting emotions from images depicting social settings.

7. The paper discusses the training of multi-stream models by applying late fusion with Facial Emotion Recognition predictions, EmoNet predictions, and Places365 predictions or features, to improve the baseline performance for Emotion recognition tasks.

8. The research provides insights into the annotation interface, data collection process, and the challenges encountered during the dataset creation, and open-sources the annotation interface and code for model training.

9. The paper concludes by emphasizing the complexity of the dataset and the difficulty of the tasks, highlighting the availability of the FindingEmo dataset and baseline results for further research and development in the field of Emotion Recognition and Social Cognition.

Summary

The paper introduces the FindingEmo dataset, a collection of 25,000 image annotations tailored for Emotion Recognition. Unlike existing datasets, FindingEmo focuses on complex scenes with multiple individuals in naturalistic social settings, rather than solely on faces or single individuals. The annotations include dimensions of Valence, Arousal, and Emotion label, and were gathered using the Prolific platform. The paper also provides the list of URLs pointing to the original images and associated source code. The dataset creation process, including the image collection and annotation gathering, is detailed in the paper.

Dataset Description and Model Analysis
The dataset is aimed at targeting higher-order social cognition and presents a challenging task for Emotion Recognition. Baseline results for Emotion, Arousal, and Valence prediction are presented, demonstrating that these tasks are complex and difficult. The paper also discusses the dataset's annotation interface and openly available code for model training.

Additionally, the paper explores the use of transfer learning to apply popular ImageNet-based ANN architectures and investigates the effect of merging features and predictions of several models. Furthermore, the paper discusses the application of late fusion with Facial Emotion Recognition predictions, EmoNet predictions, and Places365 predictions or features to enhance the baseline results. The findings indicate that improving upon the baseline is challenging, and the addition of facial emotion features has a significant impact on the performance, while the model captures different salient information than standard ImageNet networks. The paper provides a comprehensive overview of the dataset creation, annotation, and validation processes and offers a starting point for research in Emotion Recognition.

Reference: https://arxiv.org/abs/2402.01355