A novel multimodal approach for emotion recognition using Text, Speech, and Facial Expression data

Authors

  • Nikita Joshi M. Tech. Scholar Dept. of CSE SSIPMT, Raipur Author
  • Dr. Rakesh Kumar Khare Associate Professor Dept. of CSE SSIPMT, Raipur Author

DOI:

https://doi.org/10.48047/pgff9y84

Keywords:

Multimodal Emotion Recognition, Human-Computer Interaction, Feature Fusion, Modality Alignment, Attention Mechanism, Text, Speech, Facial Expression, Deep Learning, Emotional Adaptation, Robust Emotion Classification

Abstract

 Multimodal emotion recognition (MER) is essential for improving human-computer interaction by allowing systems to comprehend and react to human emotions.  This research introduces an innovative method for Multimodal Emotion Recognition (MER) by combining three modalities—text, speech, and facial expression—through a sophisticated framework that employs modality-specific feature selection, independent classification, and learnable attention-based feature fusion.  The suggested approach tackles the difficulties of aligning and integrating features from several modalities while maintaining robustness and interpretability.  The model adeptly adjusts to fluctuating emotional displays by utilizing the strengths of each modality.  The research illustrates the efficacy of the suggested methodology via comprehensive trials, revealing substantial enhancements in classification precision and resilience relative to conventional unimodal and rudimentary multimodal systems.  This research advances the creation of more empathic and contextually cognizant human-computer interaction systems, with potential applications in virtual assistants, mental health monitoring, and customer service.

Downloads

Download data is not yet available.

References

Li, Q., Gao, Y., Wen, Y., Wang, C., & Li, Y. (2024). Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition. 4663–4667. https://doi.org/10.21437/interspeech.2024-1462

Wang, X., Zhao, S., Sun, H., Wang, H., Zhou, J., & Qin, Y. (2024). Enhancing Multimodal Emotion Recognition through Multi-Granularity Cross-Modal Alignment. https://doi.org/10.48550/arxiv.2412.20821

Zhu, J., Zhu, X., Wang, S., Wang, T., Huang, J., & Wang, R. (2024). Multi-Modal Emotion Recognition Using Tensor Decomposition Fusion and Self-Supervised Multi-Tasking. https://doi.org/10.21203/rs.3.rs-3916468/v1

Hou, M., Zhang, Z., Li, C., & Lu, G. (2023). Semantic Alignment Network for Multi-modal Emotion Recognition. IEEE Transactions on Circuits and Systems for Video Technology, 1. https://doi.org/10.1109/tcsvt.2023.3247822

Knowledge-Aware Bayesian Co-Attention for Multimodal Emotion Recognition. (2023). https://doi.org/10.1109/icassp49357.2023.10095798

Zhao, Z., Wang, Y., & Wang, Y. (2023). Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, abs/2302.09856. https://doi.org/10.48550/arXiv.2302.09856

Knowledge-aware Bayesian Co-attention for Multimodal Emotion Recognition. (2023). https://doi.org/10.48550/arxiv.2302.09856

Wang, Y., Li, D., & Shen, J. (2024). Inter-Modality and Intra-Sample Alignment for Multi-Modal Emotion Recognition. https://doi.org/10.1109/icassp48485.2024.10446571

Wu, Y., Zhang, S., & Li, P. (2024). Improvement of Multimodal Emotion Recognition Based on Temporal-Aware Bi-Direction Multi-Scale Network and Multi-Head Attention Mechanisms. Applied Sciences. https://doi.org/10.3390/app14083276

Wang, X., Ran, F., Hao, Y., Zang, H. L., & Yang, Q. (2024). Sequence Modeling and Feature Fusion for Multimodal Emotion Recognition. https://doi.org/10.1109/iccect60629.2024.10546216

Shi, X., Li, X., & Toda, T. (2024). Multimodal Fusion of Music Theory-Inspired and Self-Supervised Representations for Improved Emotion Recognition. 3724–3728. https://doi.org/10.21437/interspeech.2024-2350

Zhang, Y., Ding, K., Wang, X., Liu, Y., & Bao, S. (2024). Multimodal Emotion Reasoning Based on Multidimensional Orthogonal Fusion. https://doi.org/10.1109/icipmc62364.2024.10586672

Li, X., Liu, J., Xie, Y.-P., Gong, P., Zhang, X., & He, H. (2023). MAGDRA: A Multi-modal Attention Graph Network with Dynamic Routing-By-Agreement for multi-label emotion recognition. Knowledge Based Systems, 283, 111126. https://doi.org/10.1016/j.knosys.2023.111126

He, J., Wu, M., Li, M., Zhu, X., & Ye, F. (2022). Multilevel Transformer For Multimodal Emotion Recognition. IEEE International Conference on Acoustics, Speech, and Signal Processing, abs/2211.07711. https://doi.org/10.48550/arXiv.2211.07711

Sun, Y., Cheng, D., Chen, Y., & He, Z. (2023). DynamicMBFN: Dynamic Multimodal Bottleneck Fusion Network for Multimodal Emotion Recognition. 639–644. https://doi.org/10.1109/isctis58954.2023.10213035

Chen, Y., Luo, H., Chen, J., & Wang, Y. (2024). Multimodal Emotion Recognition Algorithm Based on Graph Attention Network. https://doi.org/10.1109/ainit61980.2024.10581429

Wu, W., Chen, D., & Fang, P. (2024). A Two-Stage Multi-Modal Multi-Label Emotion Recognition Decision System Based on GCN. International Journal of Decision Support System Technology, 16(1), 1–17. https://doi.org/10.4018/ijdsst.352398

Wang, H. (2024). Optimizing Multimodal Emotion Recognition: Evaluating the Impact of Speech, Text, and Visual Modalities. 81–85. https://doi.org/10.1109/icedcs64328.2024.00019

Downloads

Published

2024-12-22

How to Cite

A novel multimodal approach for emotion recognition using Text, Speech, and Facial Expression data (N. Joshi & R. Kumar Khare , Trans.). (2024). Cuestiones De Fisioterapia, 53(03), 5303-5314. https://doi.org/10.48047/pgff9y84