UNIVERSITY OF HERTFORDSHIRE COMPUTER SCIENCE RESEARCH COLLOQUIUM presents "Speech Emotion Recognition Using Transformer-based Models" Dr Yi Sun (University of Hertfordshire) 21 February 2024 13:00 -14:00 Room C154 Everyone is Welcome to Attend Abstract: Creating speech emotion recognition models comparable to the capability of how humans recognise emotions is a long-standing challenge in the field of speech technology. As transformer-based architectures have recently become state-of-the-art for many natural language processing-related applications, we investigated their suitability for acoustic emotion recognition and compared them to the well-known AlexNet convolutional approach. This comparison was made using several publicly available speech emotion corpora. Experimental results demonstrate the efficacy of the different architectural approaches for particular emotions. The results show that the transformer-based models outperform their convolutional counterparts, yielding F1 scores of 70.33% to 75.76%. We then further investigated the generalisation ability of a specified transformer-based model, HuBERT. We applied the HuBERT model to a combined training set of five publicly available datasets. We conducted cross-corpus testing on the Strong Emotion (StrEmo) Dataset (a natural dataset) and two publicly available datasets (SAVEE and 20% CMU-MOSEI), which were not used in the training stage. Our best result achieved an F1 score of 78% over the three test sets, with an F1 score of 86% for StrEmo specifically. We observed that the fine-tuned model was more effective in analysing stronger emotional data. However, it was noted that the fine-tuned model had a higher tendency to misclassify negative emotion as neutral, indicating a need for further investigation with more data in the future. --------------------------------------------------- Hertfordshire Computer Science Research Colloquium http://cs-colloq.cs.herts.ac.uk