Performance analysis of an algorithm for estimating emotions based on the trajectory of the fundamental frequency
DOI:
https://doi.org/10.46793/ICEMIT23.287PKeywords:
fundamental frequency, emotion, emotional state, confusion matrixAbstract
In the presented study, an algorithm is analized to esttimate speakers' emotions using speech analysis techniques. The primary focus of the algorithm involves analyzing the trajectory of the fundamental frequency F0(t) to accurately determine a range of emotional states. This investigation includes a comprehensive analysis conducted within planes (F0, s2) and (F0, T). During the initial training phase, a clear decision criterion is established to differentiate between emotional states. This criterion is defined based on the analysis of test signals and is positioned within the designated planes. The performance evaluation of the algorithm is executed during the testing phase, utilizing a confusion matrix. This evaluation allows for a precise assessment of the algorithm's capability to detect emotional states. Furthermore, a comparative study is undertaken, comparing outcomes related to the identification of various emotional states such as Normal/Anger, Normal/Boredom, and Normal/Anxiety. To provide a comprehensive presentation of the algorithm's effectiveness in identifying emotional states, the results are presented in tabulles and graphs.
References
Ayadi, M., Kamel, M. S., & Karray, F. (2011). Survey on speech emotion recognition: Features, classification schemes, and databases. Pattern recognition, 44(3), 572-587. https://doi.org/10.1016/j.patcog.2010.09.020
Bezooijen, R. V. (1984). Characteristics and recognizability of vocal expressions of emotion. De Gruyter.
Cairns, D. A., & Hansen, J. H. (1994). Nonlinear analysis and classification of speech under stressed conditions. The Journal of the Acoustical Society of America, 96(6), 3392-3400. https://doi.org/10.1121/1.410601
France, D. J., Shiavi, R. G., Silverman, S., Silverman, M., & Wilkes, M. (2000). Acoustical properties of speech as indicators of depression and suicidal risk. IEEE transactions on Biomedical Engineering, 47(7), 829-837. https://doi.org/10.1109/10.846676
Hyun, K. H., Kim, E. H., & Kwak, Y. K. (2005, August). Improvement of emotion recognition by Bayesian classifier using non-zero-pitch concept. In ROMAN 2005. IEEE International Workshop on Robot and Human Interactive Communication, 2005. (pp. 312-316). IEEE. https://doi.org/10.1109/ROMAN.2005.1513797
Lee, C. M., & Narayanan, S. S. (2005). Toward detecting emotions in spoken dialogs. IEEE transactions on speech and audio processing, 13(2), 293-303. https://doi.org/10.1109/TSA.2004.838534
Milivojević, Z. N., Prlinčević, B. P., & Kostić, D. (2023). Procena emocionalnog stanja govornika statističkom analizom fundamentalne frekvencije. In 2023 22st International Symposium INFOTEH-JAHORINA (INFOTEH).
Nwe, T. L., Foo, S. W., & De Silva, L. C. (2003). Speech emotion recognition using hidden Markov models. Speech communication, 41(4), 603-623. https://doi.org/10.1016/S0167-6393(03)00099-2
Seehapoch, T., & Wongthanavasu, S. (2013, January). Speech emotion recognition using support vector machines. In 2013 5th international conference on Knowledge and smart technology (KST) (pp. 86-91). IEEE. https://doi.org/10.1109/KST.2013.6512793
Panda, B., Padhi, D., Dash, K., & Mohanty, S. (2012). Use of SVM classifier & MFCC in speech emotion recognition system. International Journal of Advanced Research in Computer Science and Software Engineering, 2(3), 225-230.
Praat (2023). Praat. Softonic. https://praat.en.softonic.com/
Prlinčević, B. P., Milivojević, Z. N., Simović, V., & Kostić, D. (2023). Estimation of emotional Normal/Boredom state by fundamental frequency trajectory analysis. http://www.fmns.swu.bg/en/index.html
Prlinčević, B., Milivojević, Z., & Simović, V. (2023). Estimation of emotions normal/anxiety by fundamental frequency trajectory analysis. KNOWLEDGE-International Journal, 58(3), 495-500.
Seehapoch, T., & Wongthanavasu, S. (2013, January). Speech emotion recognition using support vector machines. In 2013 5th international conference on Knowledge and smart technology (KST) (pp. 86-91). IEEE. https://doi.org/10.1109/KST.2013.6512793
Srinivas, V., & Madhu, T. (2014). Neural network-based classification for speaker identification. International Journal of Signal Processing, Image Processing and Pattern Recognition, 7(1), 109-120.
Tolkmitt, F. J., & Scherer, K. R. (1986). Effect of experimentally induced stress on vocal parameters. Journal of Experimental Psychology: Human Perception and Performance, 12(3), 302. https://psycnet.apa.org/doi/10.1037/0096-1523.12.3.302
Ververidis, D., & Kotropoulos, C. (2006). Emotional speech recognition: Resources, features, and methods. Speech communication, 48(9), 1162-1181. https://doi.org/10.1016/j.specom.2006.04.003
Wanare, M. A. P., & Dandare, S. N. (2014). Human emotion recognition from speech. Int. Journal of Engineering Research and Applications, 4(7), 74-78.
Wang, K. C. (2015). Time-frequency feature representation using multi-resolution texture analysis and acoustic activity detector for real-life speech emotion recognition. sensors, 15(1), 1458-1478. https://doi.org/10.3390/s150101458
Womack, B. D., & Hansen, J. H. (1999). Classification of speech under stress using target driven features. Speech Communication, 20(1-2), 131-150. https://doi.org/10.1016/S0167-6393(96)00049-0
Downloads
Published
How to Cite
Conference Proceedings Volume
Section
License
Copyright (c) 2023 International Scientific Conference on Economy, Management and Information Technologies

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.