[Paper] [Slide]
The ideas in this paper have been shared with another person and have been published at the 2023 ICIP conference and ASK conference, but I am not listed as an author (Even though it was my idea - proposed 1 and proposed 2). So, if you read a paper or a thesis with similar ideas, please note that I'm not the one who copied them :)
Abstract:
Deep learning methods and attention mechanisms have been incorporated to improve facial emotion recognition, which has recently attracted much attention. The fusion approaches have improved accuracy by combining various types of information. This research proposes a fusion network with self-attention and local attention mechanisms. It uses a multi-layer perceptron network. The network extracts distinguishing characteristics from facial images using pre-trained models on RAF-DB dataset. We outperform the other fusion methods on RAD-DB dataset withim pressive results.
Proposed Method
Summary: We employ a fusion method to combine the final features of two emotion recognition models that have already been trained. The goal of this combination is to minimize the weaknesses of each model while maximizing their strengths. To generate a new feature the same size as the input sizes, we first concatenate the two features and pass them through a Multi-layer Perceptron (MLP). We average the features before passing them through the self attention block and the local attention block, and then back and forth through a completely connected network to classify emotions..
Experiments
Fusion Method | Model 1 | Model 2 | RAF-DB (%) |
Late Fusion | Resnet18 | Resnet34 | 86.35% |
ㅤ | VGG11 | VGG13 | 86.08% |
ㅤ | VGG11 | Resnet34 | 86.08% |
Early Fusion | Resnet18 | Resnet34 | 86.66% |
ㅤ | VGG13 | Renet34 | 85.49% |
ㅤ | VGG11 | Resnet34 | 86.08% |
Joint-fusion | Resnet18 | Resnet34 | 86.05% |
ㅤ | VGG13 | Resnet34 | 86.63% |
ㅤ | VGG11 | Resnet34 | 86.40% |
Fusion attention (ours) | Resnet18 | Resnet34 | 90.95% |
ㅤ | VGG13 | Resnet34 | 90.92% |