Performance analysis of an algorithm for estimating emotions based on the trajectory of the fundamental frequency

Bojan Prlinčević; Zoran Milivojević; Dijana Kostić

doi:10.46793/ICEMIT23.287P

Authors

Bojan Prlinčević Kosovo and Metohija Academy of Applied Studies, Leposavić, Serbia
Zoran Milivojević Academy of Applied Technical and Preschool Studies Niš, Serbia
Dijana Kostić Šargan inženjering d.o.o. Niš, Serbia

DOI:

https://doi.org/10.46793/ICEMIT23.287P

Keywords:

fundamental frequency, emotion, emotional state, confusion matrix

Abstract

In the presented study, an algorithm is analized to esttimate speakers' emotions using speech analysis techniques. The primary focus of the algorithm involves analyzing the trajectory of the fundamental frequency F₀(t) to accurately determine a range of emotional states. This investigation includes a comprehensive analysis conducted within planes (F₀, s²) and (F₀, T). During the initial training phase, a clear decision criterion is established to differentiate between emotional states. This criterion is defined based on the analysis of test signals and is positioned within the designated planes. The performance evaluation of the algorithm is executed during the testing phase, utilizing a confusion matrix. This evaluation allows for a precise assessment of the algorithm's capability to detect emotional states. Furthermore, a comparative study is undertaken, comparing outcomes related to the identification of various emotional states such as Normal/Anger, Normal/Boredom, and Normal/Anxiety. To provide a comprehensive presentation of the algorithm's effectiveness in identifying emotional states, the results are presented in tabulles and graphs.

References

Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern recognition, 44(3), 572-587. https://doi.org/10.1016/j.patcog.2010.09.020

Bezooijen, R. V. (1984). Characteristics and recognizability of vocal expressions of emotion. De Gruyter.

Cairns, D. A., & Hansen, J. H. (1994). Nonlinear analysis and classification of speech under stressed conditions. The Journal of the Acoustical Society of America, 96(6), 3392-3400. https://doi.org/10.1121/1.410601

France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE transactions on Biomedical Engineering, 47(7), 829-837. https://doi.org/10.1109/10.846676

Hyun, K. H., Kim, E. H., & Kwak, Y. K. (2005, August). Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept. In ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005. (pp. 312-316). IEEE. https://doi.org/10.1109/ROMAN.2005.1513797

Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE transactions on speech and audio processing, 13(2), 293-303. https://doi.org/10.1109/TSA.2004.838534

Milivojević, Z. N., Prlinčević, B. P., & Kostić, D. (2023). Procena emocionalnog stanja govornika statističkom analizom fundamentalne frekvencije. In 2023 22st International Symposium INFOTEH-JAHORINA (INFOTEH).

Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech communication, 41(4), 603-623. https://doi.org/10.1016/S0167-6393(03)00099-2

Seehapoch, T., & Wongthanavasu, S. (2013, January). Speech emotion recognition using support vector machines. In 2013 5th international conference on Knowledge and smart technology (KST) (pp. 86-91). IEEE. https://doi.org/10.1109/KST.2013.6512793

Panda, B., Padhi, D., Dash, K., & Mohanty, S. (2012). Use of SVM classifier & MFCC in speech emotion recognition system. International Journal of Advanced Research in Computer Science and Software Engineering, 2(3), 225-230.

Praat (2023). Praat. Softonic. https://praat.en.softonic.com/

Prlinčević, B. P., Milivojević, Z. N., Simović, V., & Kostić, D. (2023). Estimation of emotional Normal/Boredom state by fundamental frequency trajectory analysis. http://www.fmns.swu.bg/en/index.html

Prlinčević, B., Milivojević, Z., & Simović, V. (2023). Estimation of emotions normal/anxiety by fundamental frequency trajectory analysis. KNOWLEDGE-International Journal, 58(3), 495-500.

Seehapoch, T., & Wongthanavasu, S. (2013, January). Speech emotion recognition using support vector machines. In 2013 5th international conference on Knowledge and smart technology (KST) (pp. 86-91). IEEE. https://doi.org/10.1109/KST.2013.6512793

Srinivas, V., & Madhu, T. (2014). Neural network-based classification for speaker identification. International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(1), 109-120.

Tolkmitt, F. J., & Scherer, K. R. (1986). Effect of experimentally induced stress on vocal parameters. Journal of Experimental Psychology: Human Perception and Performance, 12(3), 302. https://psycnet.apa.org/doi/10.1037/0096-1523.12.3.302

Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech communication, 48(9), 1162-1181. https://doi.org/10.1016/j.specom.2006.04.003

Wanare, M. A. P., & Dandare, S. N. (2014). Human emotion recognition from speech. Int. Journal of Engineering Research and Applications, 4(7), 74-78.

Wang, K. C. (2015). Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition. sensors, 15(1), 1458-1478. https://doi.org/10.3390/s150101458

Womack, B. D., & Hansen, J. H. (1999). Classification of speech under stress using target driven features. Speech Communication, 20(1-2), 131-150. https://doi.org/10.1016/S0167-6393(96)00049-0