Papers - LEE Akinobu
-
大語彙連続音声認識における単語信頼度に基づく単語固有ノードの枝刈り手法の検討
小林 大晃, 伊藤 直晃, 李 晃伸
日本音響学会2014年春季研究発表会講演論文集 2014.03
Language:Japanese Publishing type:Research paper (conference, symposium, etc.)
-
3-Q5-13 - 224 2014.03
Language:Japanese Publishing type:Research paper (conference, symposium, etc.)
-
条件付き確立場に基づく仮説の遂次早期確定を用い低遅延音声インタフェース
伊神 陽介, 李 晃伸, 徳田 恵一, 南角 吉彦
日本音響学会2014年春季研究発表会講演論文集 2-4-7 2014.03
Language:Japanese Publishing type:Research paper (conference, symposium, etc.)
-
ユーザ生成型音声対話コンテンツに向けた有限状態トランスデューサに基づく簡潔な対話記述法の検討
船谷内 泰斗, 大浦 圭一郎, 南角 吉彦, 李 晃伸, 徳田 恵一
音響学会講演論文集 223 - 224 2013.09
Language:Japanese Publishing type:Research paper (scientific journal)
-
MMDAGENT - A FULLY OPEN-SOURCE TOOLKIT FOR VOICE INTERACTION SYSTEMS Reviewed International journal
Akinobu Lee, Keiichiro Oura, Keiichi Tokuda
2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 8382 - 8385 2013.05
Authorship:Lead author Language:English Publishing type:Research paper (international conference proceedings) Publisher:IEEE
This paper describes development of an open-source toolkit which makes it possible to explore a vast variety of aspects in speech interactions at spoken dialog systems and speech interfaces. The toolkit tightly incorporates recent speech recognition and synthesis technologies with a 3-D CG rendering module that can manipulates expressive embodied agent characters. The software design and its interfaces are carefully designed to be fully open toolkit. Ongoing demonstration experiments to public indicates that it is promoting related researches and developments of voice interaction systems in various scenes.
-
スマートフォン単体で動作する音声対話3Dエージェント「スマートメイちゃん」の開発
山本 大介, 大浦 圭一郎, 李 晃伸 他
情報処理学会インタラクション 675 - 680 2013.03
Language:Japanese Publishing type:Research paper (conference, symposium, etc.)
-
ユーザ参加型双方向音声案内デジタルサイネージシステムの開発・設置・運用事例 Invited
徳田恵一, 大浦圭一郎, 李晃伸, 山本大介, 打矢隆弘, 内匠逸
日本音響学会2013年春季研究発表会論文集 119 - 122 2013.03
Language:Japanese Publishing type:Research paper (scientific journal)
-
Keiichiro Oura, Daisuke Yamamoto, Ichi Takumi, Akinobu Lee, Keiichi Tokuda
28 ( 1 ) 60 - 67 2013.01
Language:Japanese Publishing type:Research paper (scientific journal)
Other Link: http://id.nii.ac.jp/1004/00008160/
-
NISIMURA Ryuichi, HARA Sunao, KAWANAMI Hiromichi, LEE Akinobu, SHIKANO Kiyohiro, Ryuichi Nishimura, Sunao Hara, Hiromichi Kawanami, Akinobu Lee, Kiyohiro Shikano
Journal of the Japanese Society for Artificial Intelligence 28 ( 1 ) 52 - 59 2013.01
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:The Japanese Society for Artificial Intelligence
Other Link: http://id.nii.ac.jp/1004/00008159/
-
ドライバの社会性に関するCharacter自動推定
神沼 充伸, 西崎 友規子, ブエ・ステファン, 南角 吉彦, 李 晃伸
Human Interface 2012予稿集 2012.09
Language:Japanese Publishing type:Research paper (other academic)
-
登録キーワードと汎用言語モデルを用いた音声認識部・応答選択部の密結合に基づく統計的音声対話システム
平野隆司, 加藤杏樹, 南角吉彦, 李晃伸, 徳田恵一
2012 Information Processing Society of Japan 2012-SLP-92 ( 3 ) 1 - 6 2012.07
Language:Japanese Publishing type:Research paper (scientific journal)
-
双方向音声デジタルサイネージのための学内イベント登録システム
山本大介, 大浦圭一郎, 李晃伸, 打矢隆弘, 内匠逸, 徳田恵一, 松尾啓志
大学ITC推進協議会2011年度年次大会 2011.12
Language:Japanese Publishing type:Research paper (other academic)
-
魅力ある音声インタラクションシステムを構築するためのオープンソースツールキットMMDAgent
李晃伸, 大浦圭一郎, 徳田恵一
Technical Report of IEICE 1 - 6 2011.12
Language:Japanese Publishing type:Research paper (other academic)
-
Speech recognition based on statistical models including multiple phonetic decision trees Reviewed
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Acoustical Science and Technology 32 ( 6 ) 236 - 243 2011.11
Language:English Publishing type:Research paper (scientific journal)
-
連続音声認識における仮説の低遅延逐次確定アルゴリズムの評価
大野博之, 南角吉彦, 李晃伸, 徳田恵一
日本音響学会2011年秋季研究発表会論文集 45 - 46 2011.09
Language:Japanese Publishing type:Research paper (other academic)
-
Evaluation of Tree-Trellis Based Decoding on Over-Million LVCSR Reviewed
Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proc. ISCA Interspeech2011 1937 - 1940 2011.08
Language:English Publishing type:Research paper (international conference proceedings)
-
Bayesian Context Clustering Using Cross Validation for Speech Recognition Reviewed
Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94-D ( 3 ) 668 - 678 2011.03
Language:English Publishing type:Research paper (scientific journal) Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG
This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.
-
Evaluation of Tree-trellis based Decoding in Over-million LVCSR Reviewed
Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1948 - 1951 2011
Language:English Publishing type:Research paper (international conference proceedings) Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC
Very large vocabulary continuous speech recognition (CSR) that can recognize every sentence is one of important goals in speech recognition. Several attempts have been made to achieve very large vocabulary CSR. However, very large vocabulary CSR using a tree-trellis based decoder has not been reported. We report the performance evaluation and improvement of the "Julius" tree-trellis based decoder in large vocabulary CSR (LVCSR) involving more than one million vocabulary, referred to here as over-million LVCSR. Experiments indicated that Julius achieved a word accuracy of about 91% and a real time factor of about 2 in over-million LVCSR for Japanese newspaper speech transcription.
-
Speech recognition based on statistical models including multiple phonetic decision trees Reviewed
Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Acoustical Science and Technology 32 ( 6 ) 236 - 243 2011
Language:English Publishing type:Research paper (scientific journal)
We propose a speech recognition technique using multiple model structures. In the use of context-dependent models, decision-tree-based context clustering is applied to find an appropriate parameter tying structure. However, context clustering is usually performed on the basis of unreliable statistics of hidden Markov model (HMM) state sequences because the estimation of reliable state sequences requires an appropriate model structures, that cannot be obtained prior to context clustering. Therefore, context clustering and the estimation of state sequences essentially cannot be performed independently. To overcome this problem, we propose an optimization technique of state sequences based on an annealing process using multiple decision trees. In this technique, a new likelihood function is defined in order to treat multiple model structures, and the deterministic annealing expectation maximization algorithm is used as the training algorithm. Experimental continuous phoneme recognition results show that the proposed method of using only two decision trees achieved about an 11.1% relative error reduction over the conventional method. © 2011 The Acoustical Society of Japan.
DOI: 10.1250/ast.32.236
-
Voice activity detection based on conditional random fields using multiple features(共著) Reviewed
Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda
Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2086 - 2089 2010.09
Language:English Publishing type:Research paper (international conference proceedings)