INFORMATION
탐색 건너뛰기 링크입니다.
RESEARCH
Y.J.KIM,  Dept. of  Computer Engineering, Hanbat National University
   International Journal
 A Study on the Emotion Feature composed of the Mel-frequency Cepstral Coefficient and the Speech Speed(Scopus)
   Youjung Ko, Insuk Hong, Hyunsoon Shin, Yoonjoong Kim
 

Through an experiment, this research introduces and verifies the usefulness of an emotion feature that uses prosody attributes such as loudness, pitch, and sound length to express characteristics of emotion. A sound length is proportional to pronunciation duration and is inversely proportional to the number of phonemic changes per unit of time. Based on this fact, speech speed and the emotion feature were calculated as follows. First, a codebook was generated from the Mel-frequency Cepstral Frequency(MFCC) vector of the speech data provided and second, the MFCC vector of the speech signal was vector-quantized by this codebook to generate a quantized sequence. Third, this sequence was considered a phoneme sequence and the speech speed was computed by normalizing the number of the phoneme change for each window. Fourth, the emotion feature was generated as follows based on this speech speed. The speech speed was added to the MFCC vector with the delta and acceleration computation to generate the emotion feature which implies the prosody elements such loudness, pitch, and sound length. In order to analyze the utility of the emotion feature, a recognition system was developed with the emotion feature and the Hidden Markov Model(HMM). For maximum performance, the degree of MFCC, the size of the codebook, the method of speech speed computation, the window size of speech speed computation, the number of the HMM model state, and the number of the Gaussian Mixture Model(GMM) per state were selected. During the recognition test, a text-independent speaker-independent experiment as well as a text-independent speaker-dependent experiment were conducted. It was verified that the recognition test using the emotion feature showed greater improvement than the recognition test using only the speech feature, i.e., an improvement of 2.5% and 3.5%, respectively

 
International Journal of Multimedia and Ubiquitous Engineering, Vol. 12, No. 1, January 2017  
  2017-01-01/2019-12-17/김윤중