A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images

dc.contributor.authorSiddiqui, Mohammad Faridul Haqueen_US
dc.date.accessioned2023-02-27T20:28:40Z
dc.date.available2023-02-27T20:28:40Z
dc.date.issued2023-03-02
dc.descriptionVery few infrared (IR) emotional databases are available for use, and most of them did not meet the framework requirements we were working on. To address this, we developed our own visible and IR image database called the VIRI database. This new database was created at The University of Toledo and was designed to overcome the limitations of existing IR databases and includes facial expressions captured in both visible and IR format in uncontrolled wild backgrounds. The database was created using pictures from on-campus students who consented to be included in the study. The VIRI database includes five different expressions (happy, sad, angry, surprised, and neutral) captured from 110 subjects (70 males and 40 females), resulting in 550 images in a radiometric JPEG format. The format constitutes visible, infrared, and MSX images and VIRI DB contains all three forms.en_US
dc.description.abstractThe exigency of emotion recognition is pushing the envelope for meticulous strategies of discerning actual emotions through the use of superior multimodal techniques. This work presents a multimodal automatic emotion recognition (AER) framework capable of differentiating between expressed emotions with high accuracy. The contribution involves implementing an ensemble-based approach for the AER through the fusion of visible images and infrared (IR) images with speech. The framework is implemented in two layers, where the first layer detects emotions using single modalities while the second layer combines the modalities and classifies emotions. Convolutional Neural Networks (CNN) have been used for feature extraction and classification. A hybrid fusion approach comprising early (feature-level) and late (decision-level) fusion, was applied to combine the features and the decisions at different stages. The output of the CNN trained with voice samples of the RAVDESS database was combined with the image classifier's output using decision-level fusion to obtain the final decision. An accuracy of 86.36% and similar recall (0.86), precision (0.88), and f-measure (0.87) scores were obtained. A comparison with contemporary work endorsed the competitiveness of the framework with the rationale for exclusivity in attaining this accuracy in wild backgrounds and light-invariant conditions.en_US
dc.identifier.urihttps://hdl.handle.net/11310/5063
dc.language.isoen_USen_US
dc.subject2023 Faculty and Student Research Poster Session and Research Fairen_US
dc.subjectWest Texas A&M Universityen_US
dc.subjectCollege of Engineeringen_US
dc.subjectPosteren_US
dc.subjectMultimodal automatic emotion recognitionen_US
dc.subjectEmotionsen_US
dc.subjectConvolutional Neural Networksen_US
dc.titleA Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Imagesen_US
dc.typePresentationen_US

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
Siddiqui, Mohammad Faridul Haque.pdf
Size:
38.1 MB
Format:
Adobe Portable Document Format
Description:
Poster
Loading...
Thumbnail Image
Name:
Siddiqui, Mohammad Faridul Haque.JPG
Size:
107.89 KB
Format:
Joint Photographic Experts Group/JPEG File Interchange Format (JFIF)
Description:
Poster

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: