A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images

Siddiqui, Mohammad Faridul Haque

A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images

dc.contributor.author	Siddiqui, Mohammad Faridul Haque	en_US
dc.date.accessioned	2023-02-27T20:28:40Z
dc.date.available	2023-02-27T20:28:40Z
dc.date.issued	2023-03-02
dc.description	Very few infrared (IR) emotional databases are available for use, and most of them did not meet the framework requirements we were working on. To address this, we developed our own visible and IR image database called the VIRI database. This new database was created at The University of Toledo and was designed to overcome the limitations of existing IR databases and includes facial expressions captured in both visible and IR format in uncontrolled wild backgrounds. The database was created using pictures from on-campus students who consented to be included in the study. The VIRI database includes five different expressions (happy, sad, angry, surprised, and neutral) captured from 110 subjects (70 males and 40 females), resulting in 550 images in a radiometric JPEG format. The format constitutes visible, infrared, and MSX images and VIRI DB contains all three forms.	en_US
dc.description.abstract	The exigency of emotion recognition is pushing the envelope for meticulous strategies of discerning actual emotions through the use of superior multimodal techniques. This work presents a multimodal automatic emotion recognition (AER) framework capable of differentiating between expressed emotions with high accuracy. The contribution involves implementing an ensemble-based approach for the AER through the fusion of visible images and infrared (IR) images with speech. The framework is implemented in two layers, where the first layer detects emotions using single modalities while the second layer combines the modalities and classifies emotions. Convolutional Neural Networks (CNN) have been used for feature extraction and classification. A hybrid fusion approach comprising early (feature-level) and late (decision-level) fusion, was applied to combine the features and the decisions at different stages. The output of the CNN trained with voice samples of the RAVDESS database was combined with the image classifier's output using decision-level fusion to obtain the final decision. An accuracy of 86.36% and similar recall (0.86), precision (0.88), and f-measure (0.87) scores were obtained. A comparison with contemporary work endorsed the competitiveness of the framework with the rationale for exclusivity in attaining this accuracy in wild backgrounds and light-invariant conditions.	en_US
dc.identifier.uri	https://hdl.handle.net/11310/5063
dc.language.iso	en_US	en_US
dc.subject	2023 Faculty and Student Research Poster Session and Research Fair	en_US
dc.subject	West Texas A&M University	en_US
dc.subject	College of Engineering	en_US
dc.subject	Poster	en_US
dc.subject	Multimodal automatic emotion recognition	en_US
dc.subject	Emotions	en_US
dc.subject	Convolutional Neural Networks	en_US
dc.title	A Multimodal Facial Emotion Recognition Framework through the Fusion of Speech with Visible and Infrared Images	en_US
dc.type	Presentation	en_US

Files

Original bundle

Now showing 1 - 2 of 2

Name:: Siddiqui, Mohammad Faridul Haque.pdf
Size:: 38.1 MB
Format:: Adobe Portable Document Format
Description:: Poster

Download

Name:: Siddiqui, Mohammad Faridul Haque.JPG
Size:: 107.89 KB
Format:: Joint Photographic Experts Group/JPEG File Interchange Format (JFIF)
Description:: Poster

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

2023 Faculty and Student Research Poster Session and Research Fair