www.wins.or.kr(IISPL)

INFORMATION

	Professor


	Members


		Grad. Students


			Phd. List


			Master Deg.


			Bachelor Deg.


		Student List


			Phd. List


			Master Deg.


			Bachelor Deg.


	Research


		Thesis


			Phd. Deg.


			Master Deg.


		Published


			Journal


			Conference


		Projects


			projects


			etc


		Patent


	Project Board


	SERVICES


	IRP

RESEARCH

Y.J.KIM, Dept. of Computer Engineering, Hanbat National University

	International Journal
	A Study on the Emotion Feature composed of the Mel-frequency Cepstral Coefficient and the Speech Speed(Scopus)
	Youjung Ko, Insuk Hong, Hyunsoon Shin, Yoonjoong Kim
	Through an experiment, this research introduces and verifies the usefulness of an emotion feature that uses prosody attributes such as loudness, pitch, and sound length to express characteristics of emotion. A sound length is proportional to pronunciation duration and is inversely proportional to the number of phonemic changes per unit of time. Based on this fact, speech speed and the emotion feature were calculated as follows. First, a codebook was generated from the Mel-frequency Cepstral Frequency(MFCC) vector of the speech data provided and second, the MFCC vector of the speech signal was vector-quantized by this codebook to generate a quantized sequence. Third, this sequence was considered a phoneme sequence and the speech speed was computed by normalizing the number of the phoneme change for each window. Fourth, the emotion feature was generated as follows based on this speech speed. The speech speed was added to the MFCC vector with the delta and acceleration computation to generate the emotion feature which implies the prosody elements such loudness, pitch, and sound length. In order to analyze the utility of the emotion feature, a recognition system was developed with the emotion feature and the Hidden Markov Model(HMM). For maximum performance, the degree of MFCC, the size of the codebook, the method of speech speed computation, the window size of speech speed computation, the number of the HMM model state, and the number of the Gaussian Mixture Model(GMM) per state were selected. During the recognition test, a text-independent speaker-independent experiment as well as a text-independent speaker-dependent experiment were conducted. It was verified that the recognition test using the emotion feature showed greater improvement than the recognition test using only the speech feature, i.e., an improvement of 2.5% and 3.5%, respectively
	International Journal of Multimedia and Ubiquitous Engineering, Vol. 12, No. 1, January 2017
	2017-01-01/2019-12-17/김윤중