論文 - 李 晃伸
-
Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition(共著) 査読あり
Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 542 - 545 2010年09月
記述言語:英語 掲載種別:研究論文(国際会議プロシーディングス)
-
A Covariance-Tying Technique for HMM-Based Speech Synthesis(共著) 査読あり
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
IEICE Transactions on Information and Systems 93 ( 3 ) 595 - 601 2010年03月
-
音声認識のデコーダと認識エンジン 査読あり
李晃伸
日本音響学会誌 日本音響学会 66 ( 1 ) 28 - 31 2010年01月
記述言語:英語 掲載種別:研究論文(学術雑誌)
-
Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition(共著) 査読あり
Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 542 - 545 2010年
-
Voice activity detection based on conditional random fields using multiple features(共著) 査読あり
Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2086 - 2089 2010年
-
SuperHマイコンへの搭載を目的とした連続音声認識ソフトウェアJuliusの計算量削減 査読あり
小窪浩明 畑岡信夫 李晃伸 河原達也 鹿野清宏
情報処理学会論文誌 50 ( 11 ) 2597 - 2606 2009年11月
-
Development of a Toolkit for Spoken Dialog System with an Anthoropomorphic Agent: Galatea 査読あり
Kouichi Katsurada, Akinobu Lee, Tatsuya Kawahara, Tatsuo Yotsukura, Shigeo Morishima, Takuya Nishimoto, Yoichi Yamashita, and Tsuneo Nitta
Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 148 - 153 2009年10月
記述言語:英語 掲載種別:研究論文(その他学術会議資料等)
-
Recent Development of Open-Source Speech Recognition Engine Julius 査読あり
Akinobu Lee and Tatsuya Kawahara
Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 131 - 137 2009年10月
記述言語:英語 掲載種別:研究論文(その他学術会議資料等)
-
Tying Covariance Matrices to Reduce the Footprint of HMM-based Speech Synthesis Systems 査読あり
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 1759 - 1762 2009年09月
記述言語:英語 掲載種別:研究論文(その他学術会議資料等)
-
総合報告 ユーザ負担のない話者・ 環境適応性を実現する自然な音声対話処理技術の総合開発
鹿野清宏, 武田一哉, 河原達也, 河原英紀, 猿渡洋, 徳田恵一, 李 晃伸, 川波弘道, 西村竜一, Randy GOMEZ, 戸田智基, 西浦敬信, 高橋 徹, 坂野秀樹, 全 炳河
電子情 報通信学会誌 92 ( 6 ) 2009年06月
記述言語:日本語 掲載種別:研究論文(学術雑誌)
-
Voice Conversion based on Simultaneous Modeling of Spectrum and F0 査読あり
Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda
Proc. IEEE International Conference on Acoustics, Speech and Signal Processing 3897 - 3900 2009年04月
記述言語:英語 掲載種別:研究論文(その他学術会議資料等)
-
Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems 査読あり
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 1723 - 1726 2009年
記述言語:英語 掲載種別:研究論文(国際会議プロシーディングス) 出版者・発行元:ISCA-INST SPEECH COMMUNICATION ASSOC
This paper proposes a technique of reducing footprint of HMM-based speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.
-
VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0 査読あり
Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 3897 - 3900 2009年
記述言語:英語 掲載種別:研究論文(国際会議プロシーディングス) 出版者・発行元:IEEE
This paper proposes a simultaneous modeling of spectrum and F(0) for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F(0) sequences are usually converted by a simple linear function. This is because F(0) is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F(0) values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F(0) sequences.
-
Speaker recognition based on Gaussian mixture models using variational Bayesian method
Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
電子情報通信学会技術研究報告 108 ( 338 ) 185 - 190 2008年12月
記述言語:英語 掲載種別:研究論文(研究会,シンポジウム資料等)
-
Speech recognition based on statistical models including multiple decision trees
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
電子情報通信学会技術研究報告 108 ( 338 ) 221 - 226 2008年12月
記述言語:英語 掲載種別:研究論文(研究会,シンポジウム資料等)
-
A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System 査読あり
Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 11 ) 2693 - 2700 2008年11月
記述言語:英語 掲載種別:研究論文(学術雑誌) 出版者・発行元:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.
-
Acoustic modeling based on model structure annealing for speech recognition 査読あり
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proceedings of Interspeech 2008 932 - 935 2008年09月
記述言語:英語 掲載種別:研究論文(国際会議プロシーディングス)
-
複数の音素決定木を用いた音声認識の検討
塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一
日本音響学会2008年秋季研究発表会講演論文集 125 - 126 2008年09月
記述言語:日本語 掲載種別:研究論文(その他学術会議資料等)
-
Speaker recognition based on variational Bayesian method 査読あり
Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proceedings of Interspeech 2008 1417 - 1420 2008年09月
記述言語:英語 掲載種別:研究論文(国際会議プロシーディングス)
-
Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition 査読あり
Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proceedings of Interspeech 2008 936 - 939 2008年09月
記述言語:英語 掲載種別:研究論文(国際会議プロシーディングス)