研究者詳細 - 李　晃伸

Speaker-Aware BERT for Multi-Party Dialog Response Selection 査読あり国際共著国際誌

Tatsuya Nishiyama, Ryota Tanaka, Yuya Ishijima, Akinobu Lee

Proc. AAAI2020 Dialogue System Technology Challenge 8 workshop 2020年02月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

その他リンク： https://sites.google.com/dstc.community/dstc8/aaai-20-workshop

言語対の音素事後確率を用いた第二言語学習者の発音習熟度判別

森凜太朗, 李晃伸

電子情報通信学会音声研究会（IEICE-SP） 2019年12月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

個別の発話スタイルを強調する Boosting Framework を用いた感情表現生成

尾関晃英, 李晃伸

情報処理学会自然言語処理研究会（IPSJ-NL） 2019年12月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

話題展開器を導入した外部知識に基づくニューラル対話モデル

田中涼太, 李晃伸

情報処理学会自然言語処理研究会（IPSJ-NL） 2019年12月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

Ensemble Dialogue System for Facts-Based Sentence Generation 査読あり国際共著国際誌

Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee

Proc. AAAI2019 Dialogue System Technology Challenge 7 workshop 2019年01月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

arXiv

その他リンク： http://workshop.colips.org/dstc7/workshop.html

《第8回》機械学習と語学学習：語学学習のための英会話シミュレーターとその設計査読あり

木村光成, 李晃伸, 川嶋宏彰

計測と制御 58 ( 11 ) 873 - 877 2019年

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：公益社団法人計測自動制御学会

外部事実情報と対話履歴を用いたアンサンブル対話システム

田中涼太, 尾関晃英, 加藤修悟, 李晃伸

SIG-SLUD 2018年11月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

This study aims to avoid "safe response" by conditioning context and external facts
extracted from information websites (e.g. Wikipedia), and then generate the response based on
real-world facts. This system consists of the three sub modules i.e. Ensemble Dialogue System,
where generated-based module, facts retrieval module, and reranking module. Thus, the response
can be determined from various viewpoints by combining multiple systems. The experiments
and evaluations are conducted based on sentence generation task of Dialog System Technology
Challenges 7, and then our system performed significantly better than many competing systems.

再帰型ニューラルネットに基づく音素情報を用いた応答選択

牧野健一郎，李晃伸

情報処理学会音声言語情報処理研究会研究報告 2017-SLP-117 ( 4 ) 1 - 6 2017年07月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：情報処理学会

日本語におけるG2Pによる統計的学習を用いた話し言葉に頑健な発音辞書の自動構築

寺田卓矢，李晃伸

情報処理学会音声言語情報処理研究会研究報告 2017-SLP-117 ( 11 ) 1 - 6 2017年07月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：情報処理学会

国際会議 ICASSP2017 報告

浅見太一, 大谷大和, 岡本拓磨, 小川哲司, 落合翼, 亀岡弘和, 駒谷和範, 高木信二, 高道慎之介, 俵直弘, 南條浩輝, 橋本佳, 福田隆, 増村亮, 松田繁樹, 李晃伸, 渡部晋治

第117回音声言語情報処理研究会 (SIG-SLP) SLP-3 2017年07月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

User generated dialogue systems: uDialogue 査読あり

Keiichi Tokuda, Akinobu Lee, Yoshihiko Nankaku, Keiichiro Oura, Kei Hashimoto, Daisuke Yamamoto, Ichi Takumi, Takahiro Uchiya, Shuhei Tsutsumi, Steve Renals, Junichi Yamagishi

Human-Harmonized Information Technology 2 77 - 114 2017年04月

記述言語：英語掲載種別：論文集(書籍)内論文出版者・発行元：Springer Japan

This chapter introduces the idea of user-generated dialogue content and describes our experimental exploration aimed at clarifying the mechanism and conditions that makes it workable in practice. One of the attractive points of a speech interface is to provide a vivid sense of interactivity that cannot be achieved with a text interface alone. This study proposes a framework that spoken dialogue systems are separated into content that can be produced and modified by users, and the systems that drive the content, and seek to clarify (1) the requirements of systems that enable the creation of attractive spoken dialogue, and (2) the conditions for the active generation of attractive dialogue content by users, while attempting to establish a method for realizing them. Experiments for validating user dialogue content generation were performed by installing interactive digital signage with a speech interface in public spaces as a dialogue device, and implementing a content generation environment for users via the Internet. The proposed framework is expected to lead to a breakthrough in the spread of using speech technology.

DOI： 10.1007/978-4-431-56535-2_3

音声対話システムにおける環境および知識の共有表出と話しかけやすさの関連調査

興梠斗吾，李晃伸

言語・音声理解と対話処理研究会 78 ( 78 ) 125 - 128 2016年10月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：人工知能学会

話しやすい音声対話システム実現のための対人対話における心理特性の関連性調査

佐藤翔平，李晃伸

言語・音声理解と対話処理研究会 78 ( 78 ) 129 - 134 2016年10月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：人工知能学会

ユーザフレンドリィな音声対話システム実現のためのユーザ話速および発話内容に基づくシステム話速制御手法の検討

三原寛哉, 李晃伸

研究報告音声言語情報処理（SLP） 2016-SLP-112 ( 15 ) 1 - 6 2016年07月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：情報処理学会

音声対話システムのオープンコンテンツ化実現のためのモジュール仕様および管理手法

山西元樹，船谷内泰斗，李晃伸

研究報告音声言語情報処理（SLP） 2016-SLP-112 ( 14 ) 1 - 6 2016年07月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：情報処理学会

音声対話システムにおけるシステムからの話しかけと他者性認知の関連性の調査

村上拓也, 李晃伸, 西川由里, 小島良広, 遠藤充

HAIシンポジウム2015 238 - 243 2015年12月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

音声対話インタフェースにおけるマルチタスク性の適切な表出方法の検討

小中彩貴, 李晃伸

HAIシンポジウム2015 108 - 112 2015年12月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

音声対話システムにおける音環境への反応表出によるアフォーダンスの評価

夏目　龍司, 李晃伸

HAIシンポジウム2015 94 - 98 2015年12月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

利用者による履歴付き対話の共同構築・拡張が可能なユーザ生成音声対話システム

宮木京介, 飯塚遼, 李晃伸

日本音響学会2015年秋季研究発表会講演論文集 3-Q-22 2015年09月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

単語間非共有ノードに基づく単語信頼度を用いたキーワードの発話中遂次確定

松尾涼平, 小林大晃, 李晃伸

日本音響学会2015年秋季研究発表会講演論文集 3-Q-12 2015年09月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

Prosodically-Enhanced Recurrent Neural Network Language Models 査読あり国際共著国際誌

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2390 - 2394 2015年09月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Prosodically-enhanced Recurrent Neural Network Language Models 査読あり

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 2390 - 2394 2015年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INT SPEECH COMMUNICATION ASSOC

Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER.

Voice interaction system with 3D-CG virtual agent for stand-alone smartphones 査読あり

Daisuke Yamamoto, Keiichiro Oura, Ryota Nishimura, Takahiro Uchiya, Akinobu Lee, Keiichi Tokuda, Ichi Takumi

HAI 2014 - Proceedings of the 2nd International Conference on Human-Agent Interaction 323 - 330 2014年10月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Association for Computing Machinery, Inc

In this paper, we propose a voice interaction system using 3D-CG virtual agents for stand-alone smartphones. Because the proposed system can handle speech recognition and speech synthesis on a stand-alone smartphone differently from the existing mobile voice interaction systems, this system enables us to talk naturally without encountering delays caused by network communications. Moreover, proposed system can be fully customized by dialogue scripts, Java-based plugins, and Android APIs. Therefore, developers can make original voice interaction systems for smartphones easily based on proposed system. We have made a subset of the proposed system available as opensource software. We expect that this system will contribute to studies of human-agent interaction using smartphones.

DOI： 10.1145/2658861.2658874

Voice interaction system with 3D-CG virtual agent for stand-alone smartphones 査読あり国際誌

Daisuke Yamamoto, Keiichiro Oura, Ryota Nishimura, Takahiro Uchirya, Akinobu Lee, Ichi Takumi, Keiichi Tokuda

the 2nd International Conference on Human Agent Interaction (HAI 2014), ACM digital library, 320 - 330 2014年10月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

統計的音声対話システムにおける音素系列を用いた頑健な応答選択

佐伯昌幸, 李晃伸

音声言語情報処理研究会SIG-SLP第101回研究会 2014年05月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

ユーザ生成型音声対話システムにおけるクリエイターとユーザの相互刺激によるインセンティブ向上の検討

飯塚遼, 李晃伸

音声言語情報処理研究会SIG-SLP第101回研究会 2014年05月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

条件付き確率場に基づく仮説の逐次早期確定を用いた低遅延音声インタフェース

伊神陽介，李晃伸，徳田恵一，南角吉彦

音響学会公演論文集 63 - 64 2014年03月

記述言語：日本語掲載種別：研究論文（学術雑誌）

大語彙連続音声認識における単語信頼度に基づく単語固有ノードの枝刈り手法の検討

小林大晃, 伊藤直晃, 李晃伸

日本音響学会2014年春季研究発表会講演論文集 2014年03月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

統計的音声対話システムにおける登録キーワードの近傍単語を優先した仮説生成に基づく応答選択

小升章裕, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2014年春季研究発表会講演論文集 3-Q5-13 - 224 2014年03月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）出版者・発行元：日本音響学会

条件付き確立場に基づく仮説の遂次早期確定を用い低遅延音声インタフェース

伊神陽介, 李晃伸, 徳田恵一, 南角吉彦

日本音響学会2014年春季研究発表会講演論文集 2-4-7 2014年03月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

ユーザ生成型音声対話コンテンツに向けた有限状態トランスデューサに基づく簡潔な対話記述法の検討

船谷内泰斗, 大浦圭一郎, 南角吉彦, 李晃伸, 徳田恵一

音響学会講演論文集 223 - 224 2013年09月

記述言語：日本語掲載種別：研究論文（学術雑誌）

MMDAgent --- A Fully Open-Source Toolkit for Voice Interaction Systems 査読あり国際誌

Akinobu Lee, Keiichiro Oura, Keiichi Tokuda.

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)　2013 8382 - 8385 2013年05月

担当区分：筆頭著者記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

DOI： 10.1109/ICASSP.2013.6639300

スマートフォン単体で動作する音声対話3Dエージェント「スマートメイちゃん」の開発

山本大介, 大浦圭一郎, 李晃伸　他

情報処理学会インタラクション 675 - 680 2013年03月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

ユーザ参加型双方向音声案内デジタルサイネージシステムの開発・設置・運用事例招待あり

徳田恵一, 大浦圭一郎, 李晃伸, 山本大介, 打矢隆弘, 内匠逸

日本音響学会2013年春季研究発表会論文集 119 - 122 2013年03月

記述言語：日本語掲載種別：研究論文（学術雑誌）

キャンパスの公共空間におけるユーザ参加型双方向音声案内デジタルサイネージシステム査読あり

大浦圭一郎, 山本大介, 内匠逸, 李晃伸, 徳田恵一

人工知能学会論文誌 28 ( 1 ) 60 - 67 2013年01月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：人工知能学会

その他リンク： http://id.nii.ac.jp/1004/00008160/

10年間の長期運用を支えた音声情報案内システム「たけまるくん」の技術査読あり

西村竜一, 原直, 川波弘道, 李晃伸, 鹿野清宏

人工知能学会論文誌 28 ( 1 ) 52 - 59 2013年01月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：一般社団法人人工知能学会

DOI： 10.11517/jjsai.28.1_52

その他リンク： http://id.nii.ac.jp/1004/00008159/

ドライバの社会性に関するCharacter自動推定

神沼充伸, 西崎友規子, ブエ・ステファン, 南角吉彦, 李晃伸

Human Interface 2012予稿集 2012年09月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

登録キーワードと汎用言語モデルを用いた音声認識部・応答選択部の密結合に基づく統計的音声対話システム

平野隆司, 加藤杏樹, 南角吉彦, 李晃伸, 徳田恵一

2012 Information Processing Society of Japan 2012-SLP-92 ( 3 ) 1 - 6 2012年07月

記述言語：日本語掲載種別：研究論文（学術雑誌）

双方向音声デジタルサイネージのための学内イベント登録システム

山本大介, 大浦圭一郎, 李晃伸, 打矢隆弘, 内匠逸, 徳田恵一, 松尾啓志

大学ITC推進協議会2011年度年次大会 2011年12月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

魅力ある音声インタラクションシステムを構築するためのオープンソースツールキットMMDAgent

李晃伸, 大浦圭一郎, 徳田恵一

Technical Report of IEICE 1 - 6 2011年12月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

Speech recognition based on statistical models including multiple phonetic decision trees 査読あり

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Acoustical Science and Technology 32 ( 6 ) 236 - 243 2011年11月

記述言語：英語掲載種別：研究論文（学術雑誌）

連続音声認識における仮説の低遅延逐次確定アルゴリズムの評価

大野博之, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2011年秋季研究発表会論文集 45 - 46 2011年09月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

Evaluation of Tree-Trellis Based Decoding on Over-Million LVCSR 査読あり

Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. ISCA Interspeech2011 1937 - 1940 2011年08月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Bayesian context clustering using cross validation for speech recognition（共著）査読あり

Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

IEICE Transactions on Information and Systems E94-D ( 3 ) 668 - 678 2011年03月

記述言語：英語掲載種別：研究論文（学術雑誌）

DOI： 10.1587/transinf.E94.D.668

Evaluation of Tree-trellis based Decoding in Over-million LVCSR 査読あり

Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1948 - 1951 2011年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INT SPEECH COMMUNICATION ASSOC

Very large vocabulary continuous speech recognition (CSR) that can recognize every sentence is one of important goals in speech recognition. Several attempts have been made to achieve very large vocabulary CSR. However, very large vocabulary CSR using a tree-trellis based decoder has not been reported. We report the performance evaluation and improvement of the "Julius" tree-trellis based decoder in large vocabulary CSR (LVCSR) involving more than one million vocabulary, referred to here as over-million LVCSR. Experiments indicated that Julius achieved a word accuracy of about 91% and a real time factor of about 2 in over-million LVCSR for Japanese newspaper speech transcription.

Speech recognition based on statistical models including multiple phonetic decision trees 査読あり

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Acoustical Science and Technology 32 ( 6 ) 236 - 243 2011年

記述言語：英語掲載種別：研究論文（学術雑誌）

We propose a speech recognition technique using multiple model structures. In the use of context-dependent models, decision-tree-based context clustering is applied to find an appropriate parameter tying structure. However, context clustering is usually performed on the basis of unreliable statistics of hidden Markov model (HMM) state sequences because the estimation of reliable state sequences requires an appropriate model structures, that cannot be obtained prior to context clustering. Therefore, context clustering and the estimation of state sequences essentially cannot be performed independently. To overcome this problem, we propose an optimization technique of state sequences based on an annealing process using multiple decision trees. In this technique, a new likelihood function is defined in order to treat multiple model structures, and the deterministic annealing expectation maximization algorithm is used as the training algorithm. Experimental continuous phoneme recognition results show that the proposed method of using only two decision trees achieved about an 11.1% relative error reduction over the conventional method. © 2011 The Acoustical Society of Japan.

DOI： 10.1250/ast.32.236

Voice activity detection based on conditional random fields using multiple features（共著）査読あり

Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2086 - 2089 2010年09月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition（共著）査読あり

Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 542 - 545 2010年09月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

A Covariance-Tying Technique for HMM-Based Speech Synthesis（共著）査読あり

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

IEICE Transactions on Information and Systems 93 ( 3 ) 595 - 601 2010年03月

記述言語：英語掲載種別：研究論文（学術雑誌）

DOI： 10.1587/transinf.E93.D.595

音声認識のデコーダと認識エンジン査読あり

李晃伸

日本音響学会誌日本音響学会 66 ( 1 ) 28 - 31 2010年01月

記述言語：英語掲載種別：研究論文（学術雑誌）

Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition（共著）査読あり

Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 542 - 545 2010年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Voice activity detection based on conditional random fields using multiple features（共著）査読あり

Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2086 - 2089 2010年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

SuperHマイコンへの搭載を目的とした連続音声認識ソフトウェアJuliusの計算量削減査読あり

小窪浩明畑岡信夫李晃伸河原達也鹿野清宏

情報処理学会論文誌 50 ( 11 ) 2597 - 2606 2009年11月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：情報処理学会

Development of a Toolkit for Spoken Dialog System with an Anthoropomorphic Agent: Galatea 査読あり

Kouichi Katsurada, Akinobu Lee, Tatsuya Kawahara, Tatsuo Yotsukura, Shigeo Morishima, Takuya Nishimoto, Yoichi Yamashita, and Tsuneo Nitta

Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 148 - 153 2009年10月

記述言語：英語掲載種別：研究論文（その他学術会議資料等）

Recent Development of Open-Source Speech Recognition Engine Julius 査読あり

Akinobu Lee and Tatsuya Kawahara

Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 131 - 137 2009年10月

記述言語：英語掲載種別：研究論文（その他学術会議資料等）

Tying Covariance Matrices to Reduce the Footprint of HMM-based Speech Synthesis Systems 査読あり

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 1759 - 1762 2009年09月

記述言語：英語掲載種別：研究論文（その他学術会議資料等）

総合報告ユーザ負担のない話者・環境適応性を実現する自然な音声対話処理技術の総合開発

鹿野清宏, 武田一哉, 河原達也, 河原英紀, 猿渡洋, 徳田恵一, 李晃伸, 川波弘道, 西村竜一, Randy GOMEZ, 戸田智基, 西浦敬信, 高橋徹, 坂野秀樹, 全炳河

電子情報通信学会誌 92 ( 6 ) 2009年06月

記述言語：日本語掲載種別：研究論文（学術雑誌）

Voice Conversion based on Simultaneous Modeling of Spectrum and F0 査読あり

Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

Proc. IEEE International Conference on Acoustics, Speech and Signal Processing 3897 - 3900 2009年04月

記述言語：英語掲載種別：研究論文（その他学術会議資料等）

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems 査読あり

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 1723 - 1726 2009年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INST SPEECH COMMUNICATION ASSOC

This paper proposes a technique of reducing footprint of HMM-based speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.

VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0 査読あり

Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 3897 - 3900 2009年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper proposes a simultaneous modeling of spectrum and F(0) for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F(0) sequences are usually converted by a simple linear function. This is because F(0) is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F(0) values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F(0) sequences.

Speaker recognition based on Gaussian mixture models using variational Bayesian method

Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

電子情報通信学会技術研究報告 108 ( 338 ) 185 - 190 2008年12月

記述言語：英語掲載種別：研究論文（研究会，シンポジウム資料等）

Speech recognition based on statistical models including multiple decision trees

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

電子情報通信学会技術研究報告 108 ( 338 ) 221 - 226 2008年12月

記述言語：英語掲載種別：研究論文（研究会，シンポジウム資料等）

A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System 査読あり

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 11 ) 2693 - 2700 2008年11月

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.

DOI： 10.1093/ietisy/e91-d.11.2693

Acoustic modeling based on model structure annealing for speech recognition 査読あり

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of Interspeech 2008 932 - 935 2008年09月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

複数の音素決定木を用いた音声認識の検討

塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2008年秋季研究発表会講演論文集 125 - 126 2008年09月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

Speaker recognition based on variational Bayesian method 査読あり

Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of Interspeech 2008 1417 - 1420 2008年09月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition 査読あり

Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of Interspeech 2008 936 - 939 2008年09月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

クロスバリデーションを用いたベイズ基準によるコンテキストクラスタリング

橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2008年春季研究発表会講演論文集 69 - 70 2008年03月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

変分ベイズ法に基づく話者認識

伊藤達也, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2008年春季研究発表会講演論文集 143 - 144 2008年03月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System. 査読あり

Tobias Cincarek, Hiromichi Kawanami, Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

IEICE Transactions 91-D ( 3 ) 576 - 587 2008年

DOI： 10.1093/ietisy/e91-d.3.576

Probabilistic Answer Selection Based on Conditional Random Fields for Spoken Dialog System 査読あり

Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 215 - 218 2008年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INST SPEECH COMMUNICATION ASSOC

A probabilistic answer selection for a spoken dialog system based on Conditional Random Fields (CRFs) is described. The probabilities of answers for a question is trained by CRFs based on the lexical and morphological properties of each word, the most likely answer against the recognized word sequence of question utterance will be chosen as the system output. Various set of feature functions were evaluated on the real data of a speech oriented information kiosk system, and it is shown that the morphological properties introduces positive effects on the response accuracy. Training with recognizer output of training database instead of manual transcription was also investigated. It was also shown that this proposed scheme can achieve higher accuracy than a conventional keyword-based answer selection.

変分ベイズ法に基づく音声認識のためのハイパーパラメータの共有構造

橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2007年秋季研究発表会講演論文集 139 - 142 2007年09月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

音声認識のための音素決定木構造のアニーリングに基づく音響モデリング

塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2007年秋季研究発表会講演論文集 143 - 146 2007年09月

記述言語：日本語掲載種別：研究論文（その他学術会議資料等）

音素決定木構造のアニーリングに基づく音響モデリング

塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

電子情報通信学会技術研究報告 107 ( 165 ) 67 - 72 2007年07月

記述言語：日本語掲載種別：研究論文（研究会，シンポジウム資料等）

ロボットにおける音声認識技術

李晃伸, 西村竜一

計測と制御 = Journal of the Society of Instrument and Control Engineers 46 ( 6 ) 441 - 446 2007年06月

記述言語：日本語出版者・発行元：The Society of Instrument and Control Engineers

DOI： 10.11499/sicejl1962.46.441

その他リンク： https://jlc.jst.go.jp/DN/JALC/00295524175?from=CiNii

Insights gained from development and long-term operation of a real-environment speech-oriented guidance system 査読あり

Tobias Cincarek, Ryuichi Nisimura, Akinobu Lee, Kiyohiro Shikano

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 157 - + 2007年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

This paper presents insights gained from operating a public speech-oriented guidance system. A real-environment speech database (300 hours) collected with the system over four years is described and analyzed regarding usage frequency, content and diversity. Having the first two years of the data completely transcribed, simulation of system development and evaluation of system performance over time is possible. The database is employed for acoustic and language modeling as well as construction of a question and answer database. Since the system input is not text but speech, the database enables also research on open-domain speech-based information access. Apart from that research on unsupervised acoustic modeling, language modeling and system portability can be carried out. A performance evaluation of the system in an early stage as well as late stage when using two years of real-environment data for constructing all system components shows the relative importance of developing each system component. The system's response accuracy is 83% for adults and 68% for children.

Real-time continuous speech recognition system on SH-4A microprocessor 査読あり

Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING 35 - + 2007年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of Julius (embedded Julius). Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. In this paper, we describe an implementation of the embedded Julius on a SH-4A microprocessor. SH-4A is a high-end 32-bit MPU (720MIPS) with on-chip FPU. However, further computational reduction is necessary for the embedded Julius to operate real-time. Applying some optimizations, the embedded Julius achieves real-time processing on the SH-4A. The experimental results show 0.89 x RT(real-time), resulting 4.0 times faster than baseline CSR. We also evaluated the embedded Julius on large vocabulary (20,000 words). It shows almost real-time processing (1.25 x RT).

Hyperparameter estimation for speech recognition based on variational Bayesian approach

Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of ASA & ASJ Joint Meeting 3042 - 3042 2006年11月

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

実環境における子供音声認識のための音韻モデルおよび教師なし話者適応の評価査読あり

鮫島充, Randy Gomez, 李晃伸, 猿渡洋, 鹿野清宏

情報処理学会論文誌 47 ( 7 ) 2295 - 2304 2006年07月

記述言語：日本語掲載種別：研究論文（国際会議プロシーディングス）

An HMM-based Singing Voice Synthesis System 査読あり

Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2274 - 2277 2006年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INST SPEECH COMMUNICATION ASSOC

The present paper describes a corpus-based singing voice synthesis system based on hidden Markov models (HMMs). This system employs the HMM-based speech synthesis to synthesize singing voice. Musical information such as lyrics, tones, durations is modeled simultaneously in a unified framework of the context-dependent HMM. It can mimic the voice quality and singing style of the original singer. Results of a singing voice synthesis experiment show that the proposed system can synthesize smooth and natural-sounding singing voice.

Voice Conversion Based on Mixtures of Factor Analyzers 査読あり

Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2278 - + 2006年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INT SPEECH COMMUNICATION ASSOC

This paper describes the voice conversion based on the Mixtures of Factor Analyzers (MFA) which can provide an efficient modeling with a limited amount of training data. As a typical spectral conversion method, a mapping algorithm based on the Gaussian Mixture Model (GMM) has been proposed. In this method two kinds of covariance matrix structures are often used : the diagonal and full covariance matrices. GMM with diagonal covariance matrices requires a large number of mixture components for accurately estimating spectral features. On the other hand, GMM with full covariance matrices needs sufficient training data to estimate model parameters. In order to cope with these problems, we apply MFA to voice conversion. MFA can be regarded as intermediate model between GMM with diagonal covariance and with full covariance. Experimental results show that MFA can improve the conversion accuracy compared with the conventional GMM.

Reducing Computation on Parallel Decoding using Frame-wise Confidence Scores 査読あり

Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1638 - 1641 2006年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：ISCA-INST SPEECH COMMUNICATION ASSOC

Parallel decoding based on multiple models has been studied to cover various conditions and speakers at a time on a speech recognition system. However, running many recognizers in parallel applying all models causes the total computational cost to grow in proportion to the number of models. In this paper, an efficient way of finding and pruning unpromising decoding processes during search is proposed. By comparing temporal search statistics at each frame among all decoders, decoders with relatively unmatched model can be pruned in the middle of recognition process to save computational cost. This method allows the model structures to be mutually independent. Two frame-wise pruning measures based on maximum hypothesis likelihoods and top confidence scores respectively, and their combinations are investigated. Experimental results on parallel recognition of seven acoustic models showed that by using the both criteria, the total computational cost was reduced to 36.53% compared to full computation without degrading the recognition accuracy.

Hidden semi-Markov model based speech recognition system using weighted finite-state transducer 査読あり

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 33 - 36 2006年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

In hidden Markov models (HNMs), state duration probabilities decrease exponentially with time. It would be inappropriate representation of temporal structure of speech. One of the solutions for this problem is integrating state duration probability distributions explicitly into the BNM. This form is known as a hidden semi-Markov model (HSMM) [1]. Although a number of attempts to use explicit duration models in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding.
In the present paper, a fully consistent speech recognition system based on the HSMM framework is proposed. In a speaker-dependent continuous speech recognition experiment, HSNM-based speech recognition system achieved about 5.9% relative error reduction over the corresponding HMM-based one.

Embedded Julius: Continuous speech recognition software for microprocessor 査読あり

Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING 378 - + 2006年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of "Julius". Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. Julius works as a real time decoder on a PC. However further computational reduction is necessary to use Julius on a microprocessor. Further cost reduction is needed. For reducing cost of calculating pdfs (probability density function), Julius adopts a GMS (Gaussian Mixture Selection) method. In this paper, we modify the GMS method to realize a continuous speech recognizer on microprocessors. This approach does not change the structure of acoustic models in consistency with that used by conventional Julius, and enables developers to use acoustic models developed by popular modeling tools. On simulation, the proposed method has archived 20% reduction of computational costs compared to conventional GMS, 40% reduction compared to no GMS. Finally, the embedded version of Julius was tested on a developmental hardware platform named "T-engine". The proposed method showed 2.23 of RTF (Real Time Factor) resulting 79% of that of no GMS without any degradation of recognition performance.

Embedded julius on T-Engine platform 査読あり

Nobuo Hataoka, Hiroaki Kokubo, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2 37 - + 2006年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：IEEE

In this paper, we report implemental results of an embedded version of Julius. We used T-Engine (TM) as a hardware platform which has a SuperH microprocessor. The Julius is free and open Continuous Speech Recognition (CSR) software running on Personal Computers (PCs) which have huge CPU power and storage memory size. The technical problems to make Julius for embedded version are computing/process and memory reductions of Julius software. We realized 2.23 of RTF (Real Time Factor) of embedded speech recognition processing on the condition of 5000-word vocabulary without any recognition accuracy degradation.

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. 査読あり

Shinichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yotsukura, Atsuhiko Kai, Akinobu Lee, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsushi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama

Life-like characters - tools, affective functions, and applications. 187 - 212 2004年

出版者・発行元：Springer

Recent progress of open-source LVCSR engine Julius and Japanese model repository - Software of continuous speech recognition consortium

Tatsuya Kawahara, Akinobu Lee, Kazuya Takeda, Katsunobu Itou, Kiyohiro Shikano

8th International Conference on Spoken Language Processing, ICSLP 2004 3069 - 3072 2004年

掲載種別：研究論文（国際会議プロシーディングス）

Continuous Speech Recognition Consortium (CSRC) was founded for further enhancement of Japanese Dictation Toolkit that had been developed by the support of a Japanese agency. Overview of its product software is reported in this paper. The open-source LVCSR (large vocabulary continuous speech recognition) engine Julius has been improved both in performance and functionality, and it is also ported to Microsoft Windows in compliance with SAPI (Speech API). The software is now used for not a few languages and plenty of applications. For plug-and-play speech recognition in various applications, we have also compiled a repository of acoustic and language models for Japanese. Especially, the set of acoustic models realizes wider coverage of user generations and speech-input environments.

Real-time word confidence scoring using local posterior probabilities on tree trellis search

Akinobu Lee, Kiyohiro Shikano, Tatsuya Kawahara

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 I793 - I796 2004年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）

Confidence scoring based on word posterior probability is usually performed as a post process of speech recognition decoding, and also needs a large number of word hypotheses to get enough confidence quality. We propose a simple way of computing the word confidence using estimated posterior probability while decoding. At the word expansion of stack decoding search, the local sentence likelihoods that contains heuristic scores of unreached segment are directly used to compute the posterior probabilities. Experimental result showed that, although the likelihoods are not optimal, it can provide slightly better confidence measures compared with N-best lists, while the computation is faster than 100-best method because no N-best decoding is required.

擬人化音声対話エージェント基本ソフトウェアの開発プロジェクト報告(プロジェクト紹介(2))(第5回音声言語シンポジウム)

嵯峨山, 茂樹, 伊藤, 克亘, 宇津呂, 武仁, 甲斐, 充彦, 小林, 隆夫, 下平, 博, 伝, 康晴, 徳田, 恵一, 中村, 哲, 西本, 卓也, 新田, 恒雄, 広瀬, 啓吉, 峯松, 信明, 森島, 繁生, 山下, 洋一, 山田, 篤, 李, 晃伸

情報処理学会研究報告. SLP, 音声言語情報処理 2003 ( 124 ) 319 - 324 2003年12月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：一般社団法人電子情報通信学会

擬人化音声対話エージェントツールキットGalatea

嵯峨山, 茂樹, 川本, 真一, 下平, 博, 新田, 恒雄, 西本, 卓也, 中村, 哲, 伊藤, 克亘, 森島, 繁生, 四倉, 達夫, 甲斐, 充彦, 李, 晃伸, 山下, 洋一, 小林, 隆夫, 徳田, 恵一, 広瀬, 啓吉, 峯松, 信明, 山田, 篤, 伝, 康晴, 宇津呂, 武仁

情報処理学会研究報告. SLP, 音声言語情報処理 2003 ( 14 ) 57 - 64 2003年02月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：一般社団法人情報処理学会

相補的バックオフを用いた言語モデル融合ツールの構築

情報処理学会論文誌 43 ( 9 ) 2884 - 2893 2002年09月

記述言語：日本語

カスタマイズ性を考慮した擬人化音声対話のソフトウェアツールキットの設計査読あり

川本真一, 下平博, 新田恒雄, 西本卓也, 中村哲, 伊藤克亘, 森島繁生, 四倉達夫, 甲斐充彦, 李晃伸, 山下洋一, 小林隆夫, 徳田恵一, 広瀬啓吉, 峯松信明, 山田篤, 伝康晴, 宇津呂武仁, 嵯峨山茂樹

情報処理学会論文誌 43 ( 7 ) 2249-2264 2002年05月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：情報処理学会

擬人化音声対話エージェント開発プロジェクト

嵯峨山, 茂樹, 伊藤, 克亘, 宇津呂, 武仁, 甲斐, 充彦, 小林, 隆夫, 下平, 博, 伝, 康晴, 徳田, 恵一, 中村, 哲, 西本, 卓也, 新田, 恒雄, 広瀬, 啓吉, 森島, 繁生, 峯松, 信明, 山下, 洋一, 山田, 篤, 李, 晃伸

日本音響学会研究発表会講演論文集 2002 ( 1 ) 27 - 28 2002年03月

記述言語：日本語掲載種別：研究論文（学術雑誌）

擬人化音声対話エージェントツールキットの基本設計

川本, 真一, 下平, 博, 新田, 恒雄, 西本, 卓也, 中村, 哲, 伊藤, 克亘, 森島, 繁生, 四倉, 達夫, 甲斐, 充彦, 李, 晃伸, 山下, 洋一, 小林, 隆夫, 徳田, 恵一, 広瀬, 啓吉, 峯松, 信明, 山田, 篤, 伝, 康晴, 宇津呂, 武仁, 嵯峨山, 茂樹

情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告 2002 ( 10 ) 61 - 66 2002年02月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：一般社団法人情報処理学会

筆者らは,顔画像が容易に交換可能で,音声合成が話者適応可能で,対話制御の記述変更が容易で,更にこれらの機能モジュール自体を別のモジュールに差し替えることが容易であり,かつ処理ハードウェアの個数に柔軟に対処できるなどの特徴を持つ擬人化音声対話エージェントシステムを構想し,実装した.各モジュールのインタフェースを統一化して扱い,モジュール間の入出力は,UNIXシステムで使われている標準入出力を用いる簡便な方法にてモジュール統合機構を実現した.いくつかの簡単な対話タスクについてエージェントを試作し,必要な機能に関する達成度を確認した.また,顔画像合成モジュールを制御する新たなモジュールの追加を容易に実現することができた.

日本語ディクテーション基本ソフトウェア(99年度版)" 査読あり

河原達也, 李晃伸, 小林哲則, 武田一哉, 峯松信明, 嵯峨山茂樹, 伊藤克亘, 伊藤彰則, 山本幹雄, 山田篤, 宇津呂武仁, 鹿野清宏

日本音響学会誌 57 ( 3 ) 210-214 - 214 2001年03月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：日本音響学会

DOI： 10.20697/jasj.57.3_210

Julius-An open source real-Time large vocabulary recognition engine

Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology 1691 - 1694 2001年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：International Speech Communication Association

Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost realtime decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tiedmixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.

Gaussian mixture selection using context-independent HMM 査読あり

Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 69 - 72 2001年

記述言語：英語掲載種別：研究論文（学術雑誌）

We address a method to efficiently select Gaussian mixtures for fast acoustic likelihood computation. It makes use of context-independent models for selection and back-off of corresponding triphone models. Specifically, for the k-best phone models by the preliminary evaluation, triphone models of higher resolution are applied, and others are assigned likelihoods with the monophone models. This selection scheme assigns more reliable back-off likelihoods to the un-selected states than the conventional Gaussian selection based on a VQ codebook. It can also incorporate efficient Gaussian pruning at the preliminary evaluation, which offsets the increased size of the pre-selection model. Experimental results show that the proposed method achieves comparable performance as the standard Gaussian selection, and performs much better under aggressive pruning condition. Together with the phonetic tied-mixture (PTM) modeling, acoustic matching cost is reduced to almost 14% with little loss of accuracy.

DOI： 10.1109/ICASSP.2001.940769

Large Vocabulary Continuous Speech Recognition using Multi-Pass Search Algorithm 査読あり

Akinobu Lee

2000年09月

記述言語：英語掲載種別：学位論文（博士）

日本語ディクテーション基本ソフトウェア(98年度版) 査読あり

河原達也, 李晃伸, 小林哲則, 武田一哉, 峯松信明, 伊藤克亘, 山本幹雄, 山田篤, 宇津呂武仁, 鹿野清宏

日本音響学会誌 56 ( 4 ) 255-259 - 259 2000年04月

記述言語：日本語掲載種別：研究論文（学術雑誌）出版者・発行元：日本音響学会

Free software toolkit for Japanese large vocabulary continuous speech recognition. 査読あり

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000 476 - 479 2000年

出版者・発行元：ISCA

その他リンク： http://dblp.uni-trier.de/db/conf/interspeech/interspeech2000.html#conf/interspeech/KawaharaLKTMSIIYYUS00

A new phonetic tied-mixture model for efficient decoding 査読あり

Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Kiyohiro Shikano

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 3 1269 - 1272 2000年

記述言語：英語掲載種別：研究論文（国際会議プロシーディングス）出版者・発行元：Institute of Electrical and Electronics Engineers Inc.

A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% with a 20000-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.

DOI： 10.1109/ICASSP.2000.861808

日本語ディクテーション基本ソフトウェア(97年度版) 査読あり

河原達也, 李晃伸, 小林哲則, 武田一哉, 峯松信明, 伊藤克亘, 伊藤彰則, 山本幹雄, 山田篤, 宇津呂武仁, 鹿野清宏

日本音響学会誌 55 ( 3 ) 175-180 - 180 1999年03月

記述言語：英語掲載種別：研究論文（学術雑誌）出版者・発行元：日本音響学会

DOI： 10.20697/jasj.55.3_175