Share / Export Citation / Email / Print / Text size:

International Journal on Smart Sensing and Intelligent Systems

Professor Subhas Chandra Mukhopadhyay

Exeley Inc. (New York)

Subject: Computational Science & Engineering, Engineering, Electrical & Electronic


eISSN: 1178-5608



VOLUME 8 , ISSUE 4 (December 2015) > List of articles


Correspondence '> Zhang Zhigang * / Correspondence '> Huang Junqin *

Keywords : Voice signal, Endpoint detection, Short-time amplitude, Multi-scale detection, Adaptive threshold.

Citation Information : International Journal on Smart Sensing and Intelligent Systems. Volume 8, Issue 4, Pages 2,175-2,194, DOI:

License : (CC BY-NC-ND 4.0)

Received Date : 10-May-2015 / Accepted: 10-November-2015 / Published Online: 01-December-2015



Voice Activity Detection (VAD) is a crucial step for speech processing, which detecting accuracy and speed directly affects the effect of subsequent processing. Some voice processing system based phone or in the indoor environment, which need simple and quick method of VAD, for these representative voice signal, this paper proposes a new algorithm which is adaptive and quick based on a major improvement to Dual-Threshold endpoint detection algorithm. First the amplitude normalization is processed to the original voice signal, the characteristic is extracted by means of short-time amplitude, which can simplify operation. Then, large-scale (long frame-length and frame-shift) short-time amplitude is used for rough detection, combining adaptive threshold judgement of consecutive frames, which can find voice areas of start-point and end-point quickly. To these areas, small-scale (short frame-length and frame-shift) short-time amplitude is used for accurate detection, forward scanning is put to start-point area, reverse scanning is put to end-point area, combining adaptive threshold judgement of consecutive frames, start-point and end-point of the effective speech can be accurately located. Experimental results show that the method of this paper can detect endpoints of voice signal more quickly and accurately, which can improve recognition performance dramatically. Large-scale can increase detection speed, small-scale can improve detection accuracy, both can be adjusted to satisfy the different requirements. The method of this paper ensures both detection speed and precision, which has more flexibility and applicability.

Content not available PDF Share



[1] Savoji M H. A robust algorithm for accurate endpointing of speech signals[J]. Speech Communication, 1989, 8(1): 45-60.
[2] L.R.Rabiner, B.H. Juang. Fundaments of Speech Recognition[M], PrentieeHall,1993.
[3] Shen Yaqiang. Voice activity detection algorithm with low signal-to-noise based short-time fractal dimension of signals[J].Chinese Journal of Scientific Instrument, 2006.6(27):2310~2312.
[4] HU Guang-rui,WEI Xiao-dong. Endpoint detection of noisy speech based on cepstrum[J]. Acta Electronica Sinica, 2000, 28(10):95~97.
[5] Shen Jialin, Huang Jeihweih, Lee Linshan. Robust entropy-based endpoint detection for speech recognition in noisy environments[C] //Proc of ICSLP 98. Sydney: Australian Speech Science and Technology Association Incorporated, 1998:232~235.
[6] Huang Liangsheng, Yang Chungho. A novel approach to robust speech endpoint detection in car environments[C] //Proc of ICASSP 00. Piscataway, NJ: IEEE, 2000: 1751-1754.
[7] LI Ru-wei,BAOA Chang-chun. Speech EndPoint Detection Algorithm Based on the Band-Partitioning Spectral Entropy and Spectral Energy[J], Journal of Beijing University of Technology, 2007(9):920-924.
[8] Zhao Huan, Zhao Lixia, Zhao Kai, et al. Voice activity detection based on distance entropy in noisy environment [C] //Proc of the 5th Int Joint Conf on INC, IMS and IDC. Los Alamitos, CA: IEEE Computer Society, 2009: 1364-1367.
[9] TIAN Ye. Robust word boundary detection through linear mapping of the sub-band energy in noisy environments[J], Journal of Tsinghua University (Sci &Tech), 2002; 42(7); 953-956.
[10] LIU Hong-xing, DAIBei-qian, LU Wei.A Speech Endpoint Detection Method Based on Consonance Energy[J], Computer Simulation,2008,11(25):305-308.
[11] C Bandt,B Pompe. Permutation entropy: a natural complexity measure for time series [J]. Physical Review Letters, 2002, 88(17): 174102-1-4.
[12] Fan Yingle, Wu Chuanyan, Li Yi, et al. Application of C0 complexity measure in detecting speech [J]. Chinese Journal of Sensors and Actuators, 2006, 19 (3): 750-753.
[13] SHI Wei,ZOU Yue-xian. Voice Activity Detection Algorithm with Low Signal-to-Noise Ratio Based on Hilbert-Huang Transform[J],Technical Acoustics,2011,12(30):281-282.
[14] Wang Ming-he,Zhang Er-hua,Tang Zhen-min,et al. Voice Activity Detection Based on Fisher Linear Discriminant Analysis[J]. Journal of Electronics & Information Technology, 2015,37(6):1343-1349.
[15] Xiao-Lei Zhang, Ji Wu. Deep belief networks based voice activity detection[C]. IEEE Transactions on Audio, Speech, and Language Processing, 2013,21(4):697-710.
[16] ZHU heng-Jun,YU Hong-bo,WANC1 Fa-zhi. Speech Endpoints Detection Algorithm Based on Support Vector Machine and Wavelet Analysis[J]. Computer Science,2012,39(6):244-265.
[17] Ryant N, Liberman M, Yuan Jia-hong. Speech activity detection on YouTube using deep neural networks[C]. Interspeech: 14th Annual Conference of the International Speech Communication Association, Lyon, France, 2013: 728-731.
[18]Kim Dong Kook, Shin Jong Won, Chang Joon-Hyuk. Enhanced voice activity detection in kernel subspace domain[J]. The Journal of the Acoustical Society of America, 2013,134 (1):EL70-6.
[19] A.M. Aibinu, M.J.E.Salami, A.A. Shafie. Artificial neural network based autoregressive modeling technique with application in voice activity detection[J]. Engineering Applications of Artificial Intelligence, 2012, 25 (6):1265-1276.
[20]Kim Dong Kook, Chang Joon-Hyuk. Statistical voice activity detection in kernel space[J]. Journal of Acoustical Society of America, 2012, 132 (4):EL303-9.
[21] Kun-Ching Wang. Voice Activity Detector for Noise Spectrum Estimation Using a Dynamic Band-Splitting Entropy Estimate [J]. International Journal of Computers and Applications, 2011, 33 (3):220-228.
[22] Jinsoo Park, Wooil Kim, David K.Han,et al. Voice Activity Detection in Noisy Environments Based on Double-Combined Fourier Transform and Line Fitting[J]. The Scientific World Journal, 2014, Vol.2014.
[23] Sang-Yeob Oh, Kyungyong Chung. Improvement of Speech Detection Using ERB Feature Extraction[J]. Wireless Personal Communications, 2014, 79 (4):2439-2451.
[24]CHAO Hao,YANG Zhan-lei,LIU Wen-ju. Itegrating articulatory information into stochastic segment models for continuous Mandarin speech recognition[J].Application Research of Computers,2014,31(11):3365-3368.
[25] Shweta Sinha, Aruna Jain ,S. S. Agrawal. Acoustic-phonetic feature based dialect identification in Hindi speech[J]. International Journal On Smart Sensing and Intelligent Systems.2015,8(1):237-254.