Papers - LEE Akinobu
-
Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition(共著) Reviewed
Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 542 - 545 2010.09
Language:English Publishing type:Research paper (international conference proceedings)
-
A Covariance-Tying Technique for HMM-Based Speech Synthesis Reviewed
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 93 ( 3 ) 595 - 601 2010.03
Language:English Publishing type:Research paper (scientific journal) Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.
-
音声認識のデコーダと認識エンジン Reviewed
李晃伸
日本音響学会誌 日本音響学会 66 ( 1 ) 28 - 31 2010.01
Language:English Publishing type:Research paper (scientific journal)
-
Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition Reviewed
Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 542 - 545 2010
Language:English Publishing type:Research paper (international conference proceedings) Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC
This paper proposes a speaker adaptation technique using a nonlinear spectral transform based on GMMs. One of the most popular forms of speaker adaptation is based on linear transforms, e.g., MLLR. Although MLLR uses multiple transforms according to regression classes, only a single linear transform is applied to each state. The proposed method performs nonlinear speaker adaptation based on a new likelihood function combining HMMs for recognition with GMMs for spectral transform. Moreover, the dependency of transforms on context can also be estimated in an integrated ML fashion. The proposed technique outperformed conventional approaches in phoneme-recognition experiments.
-
Voice activity detection based on conditional random fields using multiple features Reviewed
Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 2086 - 2089 2010
Language:English Publishing type:Research paper (international conference proceedings) Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC
This paper proposes a Voice Activity Detection (VAD) algorithm based on Conditional Random Fields (CRF) using multiple features. VAD is a technique used to distinguish between speech and non-speech in noisy environments and is an important component in many real-world speech applications. The posterior probability of output labels in the proposed method is directly modeled by the weighted sum of the feature functions. Effective features are automatically selected by estimating appropriate weight parameters to improve the accuracy of VAD. Experimental results on the CENSREC-1-C database revealed that the proposed approach can decrease error rates by using CRF.
-
Computational Reduction of Contenous Speech Recognition Software "Julius" on SuperH Microprocessor Reviewed
50 ( 11 ) 2597 - 2606 2009.11
Language:Japanese Publishing type:Research paper (scientific journal)
-
Development of a Toolkit for Spoken Dialog System with an Anthoropomorphic Agent: Galatea Reviewed
Kouichi Katsurada, Akinobu Lee, Tatsuya Kawahara, Tatsuo Yotsukura, Shigeo Morishima, Takuya Nishimoto, Yoichi Yamashita, and Tsuneo Nitta
Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 148 - 153 2009.10
Language:English Publishing type:Research paper (other academic)
-
Recent Development of Open-Source Speech Recognition Engine Julius Reviewed
Akinobu Lee and Tatsuya Kawahara
Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 131 - 137 2009.10
Language:English Publishing type:Research paper (other academic)
-
Tying Covariance Matrices to Reduce the Footprint of HMM-based Speech Synthesis Systems Reviewed
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 1759 - 1762 2009.09
Language:English Publishing type:Research paper (other academic)
-
総合報告 ユーザ負担のない話者・ 環境適応性を実現する自然な音声対話処理技術の総合開発
鹿野清宏, 武田一哉, 河原達也, 河原英紀, 猿渡洋, 徳田恵一, 李 晃伸, 川波弘道, 西村竜一, Randy GOMEZ, 戸田智基, 西浦敬信, 高橋 徹, 坂野秀樹, 全 炳河
電子情 報通信学会誌 92 ( 6 ) 2009.06
Language:Japanese Publishing type:Research paper (scientific journal)
-
Voice Conversion based on Simultaneous Modeling of Spectrum and F0 Reviewed
Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing 3897 - 3900 2009.04
Language:English Publishing type:Research paper (other academic)
-
Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems Reviewed
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 1723 - 1726 2009
Language:English Publishing type:Research paper (international conference proceedings) Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC
This paper proposes a technique of reducing footprint of HMM-based speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.
-
VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0 Reviewed
Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 3897 - 3900 2009
Language:English Publishing type:Research paper (international conference proceedings) Publisher:IEEE
This paper proposes a simultaneous modeling of spectrum and F(0) for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F(0) sequences are usually converted by a simple linear function. This is because F(0) is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F(0) values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F(0) sequences.
-
Speaker recognition based on Gaussian mixture models using variational Bayesian method
Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
電子情報通信学会技術研究報告 108 ( 338 ) 185 - 190 2008.12
Language:English Publishing type:Research paper (conference, symposium, etc.)
-
Speech recognition based on statistical models including multiple decision trees
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
電子情報通信学会技術研究報告 108 ( 338 ) 221 - 226 2008.12
Language:English Publishing type:Research paper (conference, symposium, etc.)
-
A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System Reviewed
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 11 ) 2693 - 2700 2008.11
Language:English Publishing type:Research paper (scientific journal) Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.
-
Acoustic modeling based on model structure annealing for speech recognition Reviewed
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proceedings of Interspeech 2008 932 - 935 2008.09
Language:English Publishing type:Research paper (international conference proceedings)
-
複数の音素決定木を用いた音声認識の検討
塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一
日本音響学会2008年秋季研究発表会講演論文集 125 - 126 2008.09
Language:Japanese Publishing type:Research paper (other academic)
-
Speaker recognition based on variational Bayesian method Reviewed
Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proceedings of Interspeech 2008 1417 - 1420 2008.09
Language:English Publishing type:Research paper (international conference proceedings)
-
Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition Reviewed
Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proceedings of Interspeech 2008 936 - 939 2008.09
Language:English Publishing type:Research paper (international conference proceedings)