Details of a Researcher

Papers - LEE Akinobu

Division display >> ／ All the affair displays 1 - 135 of about 135

Data generation for speaker diarization by speaker transition information Reviewed

Keigo Ichikawa, Sei Ueno, and Akinobu Lee

Asia Pacific Signal and Information Processing Association (APSIPA) 2024.12

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://www.apsipa2023.org/tprogram.html

大規模事前学習モデルによる笑い声表現を用いたspeech-laugh音声の生成

木全亮太朗, 上乃聖, 李晃伸

日本音響学会講演論文集 2024.09

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition Reviewed

Sei Ueno, Akinobu Lee, Tastuya Kawahara

IEEE/ACM Transactions on Audio, Speech, and Language Processing 32 3924 - 3933 2024.09

Language：English Publishing type：Research paper (scientific journal)

DOI： 10.1109/TASLP.2024.3451982

researchmap

Other Link： https://repository.kulib.kyoto-u.ac.jp/dspace/handle/2433/289487

Multi-setting acoustic feature training for data augmentation of speech recognition Reviewed

Sei Ueno, Akinobu Lee

Acoustical Science and Technology 45 ( 4 ) 195 - 203 2024.07

Authorship：Last author Language：English Publishing type：Research paper (scientific journal)

DOI： https://doi.org/10.1250/ast.e23.70

researchmap

Other Link： https://www.jstage.jst.go.jp/article/ast/45/4/45_e23.70/_article/-char/ja

経験情報収集および伝達を主目的とする雑談対話による関係性維持支援システム

志満津奈央, 上乃聖, 李晃伸

言語処理学会第30回年次大会発表論文集 1394 - 1399 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

Other Link： https://www.anlp.jp/proceedings/annual_meeting/2024/index.html

大規模言語モデルを用いたEmotional Support Conversation システムの構築とその評価

藤田敦也, 上乃聖, 李晃伸

言語処理学会第30回年次大会発表論文集 1378 - 1383 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

Other Link： https://www.anlp.jp/proceedings/annual_meeting/2024/index.html

センチメント分析を用いた感情を重視した物語の階層的要約手法

酒井健壱, 上乃聖, 李晃伸

言語処理学会第30回年次大会発表論文集 1119 - 1124 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

Other Link： https://www.anlp.jp/proceedings/annual_meeting/2024/index.html

3 話者以上の話者交替情報を用いたSpeaker Diarization のためのデータ生成

市川奎吾, 上乃聖, 李晃伸

日本音響学会講演論文集 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

日本語日常会話の潜在的な発話スタイルに基づく対話シーンに応じた音声合成

嶋崎純一, 上乃聖, 李晃伸

日本音響学会講演論文集 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

暗黙的な非線形処理を導入した拡散モデルを用いた音声合成

岡本海, 上乃聖, 李晃伸

日本音響学会講演論文集 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

LLM によるテキスト生成を用いた音声合成による音声認識のドメイン適応

上乃聖, 李晃伸

日本音響学会講演論文集 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

Synthesis of non-native voice with native-like accent using voice conversion

Iago Lourenço Correa, Sei Ueno, and Akinobu Lee

2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Other Link： https://acoustics.jp/annualmeeting/program/

CG エージェントを用いた音声対話システムにおける空間共有感のための自己投影法

東省吾, 上乃聖, 李晃伸

HAIシンポジウム 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

カウンセリングにおける悩み構造の言語化を支援する音声対話システム

鈴木香保, 上乃聖, 李晃伸

HAIシンポジウム 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

豊かなノンバーバルコミュニケーションのためのHMDを用いた没入型音声対話システム

宮下陸, 上乃聖, 李晃伸

HAIシンポジウム 2024.03

Authorship：Last author Language：Japanese Publishing type：Research paper (other academic)

researchmap

Collection of Voice Control Utterances During Driving Using Dialogue System with Question-Answering Database and Large Language Mode Reviewed

55 ( 2 ) 361 - 366 2024.03

Language：Japanese Publishing type：Research paper (scientific journal)

DOI： https://doi.org/10.11351/jsaeronbun.55.361

researchmap

Other Link： https://www.jstage.jst.go.jp/article/jsaeronbun/55/2/55_20244089/_article/-char/ja

Accent-Preserving Voice Conversion between Native-Nonnative Speakers for Second Language Learning Reviewed

Iago Lourenço Correa, Sei Ueno, and Akinobu Lee

Asia Pacific Signal and Information Processing Association (APSIPA) 2023.11

Authorship：Last author Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://www.apsipa2023.org/tprogram.html

Collection of Voice Control Utterances During Driving Using Dialogue System with Question-Answering Database and Large Language Mode

2023.10

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

時間軸-周波数軸上の幅を持たせる音声合成を用いた音声認識のデータ拡張

上乃聖, 李晃伸

日本音響学会講演論文集 2023.09

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

異なるスタイルの笑い声生成のための有声音・無声音間隔の制御

木全亮太朗, 上乃聖, 李晃伸

日本音響学会講演論文集 2023.09

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

発話交替頻度を導入した実対話音声に対するSpeaker Diarizationのためのデータ生成

市川奎吾, 上乃聖, 李晃伸

日本音響学会講演論文集 2023.09

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

CGアバター対話における音声からの頭部動作および表情の自動生成

藤岡侑貴, 上乃聖, 李晃伸

人工知能学会全国大会 2023.06

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

複数設定のスペクトログラムを用いた音声合成に基づく音声認識のデータ拡張

上乃聖, 李晃伸

日本音響学会講演論文集 2023.03

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Continuous Integrate-and-Fire を用いた音声区間検出とターン終了検知のマルチタスク学習

池口弘尚, 東佑樹, 上乃聖，李晃伸

日本音響学会講演論文集 2023.03

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

連続的な感情表出を用いたカウンセリング対話エージェントの評価

川又朱莉, 上乃聖, 李晃伸

HAIシンポジウム 2023.03

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

多様な笑い声生成のための有声音・無声音間隔の制御

木全亮太朗, 上乃聖, 李晃伸

情報処理学会全国大会講演論文集 2023.03

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

自律・遠隔融合対話システムのための高生命感・高存在感CGエージェントの開発

李晃伸, 石黒浩

第96回人工知能学会言語・音声理解と対話処理研究会（第13回対話システムシンポジウム） 2022.12

Authorship：Lead author,　Corresponding author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

自己投影アバターによる引き込みを用いた2D-CG音声対話システム

東省吾, 李晃伸

ヒューマンインタフェースシンポジウム2022 2T-P2 2022.09

Authorship：Last author Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

音声認識のデータ拡張のための話者情報およびマスクを用いた合成音声の周波数スペクトログラム強調

上乃聖，李晃伸，河原達也

日本音響学会講演論文集 1149 - 1150 2022.09

Language：Japanese Publishing type：Research paper (other academic)

researchmap

自動音声対話におけるネガティブ感情認識のための転移学習の性能比較

高井幸輝, 李晃伸, 戸田隆道, 東佑樹, 下山翔

人工知能学会言語・音声理解と対話処理研究会（SLUD）第93回研究会 2021.11

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

自動音声応答におけるユーザー沈黙時の発話誘導

西山達也, 李晃伸, 戸田隆道, 友松祐太, 杉山雅和

人工知能学会言語・音声理解と対話処理研究会（SLUD）第90回研究会 2020.11

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Context and Knowledge Aware Dialogue System and System Combination for Grounded Response Generation Reviewed International coauthorship International journal

Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee

Computer Speech & Language 62 2020.07

Language：English Publishing type：Research paper (scientific journal) Publisher：Elsevier

DOI： 10.1016/j.csl.2020.101070

researchmap

Other Link： https://www.sciencedirect.com/science/article/pii/S0885230820300036

Fact-based Dialogue Generation with Convergent and Divergent Decoding International journal

Ryota Tanaka, Akinobu Lee

arXiv 2020.05

Language：English Publishing type：Research paper (other academic)

Fact-based dialogue generation is a task of generating a human-like response based on both dialogue context and factual texts. Various methods were proposed to focus on generating informative words that contain facts effectively. However, previous works implicitly assume a topic to be kept on a dialogue and usually converse passively, therefore the systems have a difficulty to generate diverse responses that provide meaningful information proactively. This paper proposes an end-to-end fact-based dialogue system augmented with the ability of convergent and divergent thinking over both context and facts, which can converse about the current topic or introduce a new topic. Specifically, our model incorporates a novel convergent and divergent decoding that can generate informative and diverse responses considering not only given inputs (context and facts) but also inputs-related topics. Both automatic and human evaluation results on DSTC7 dataset show that our model significantly outperforms state-of-the-art baselines, indicating that our model can generate more appropriate, informative, and diverse responses.

arXiv

researchmap

Speaker-Aware BERT for Multi-Party Dialog Response Selection Reviewed International coauthorship International journal

Tatsuya Nishiyama, Ryota Tanaka, Yuya Ishijima, Akinobu Lee

Proc. AAAI2020 Dialogue System Technology Challenge 8 workshop 2020.02

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Other Link： https://sites.google.com/dstc.community/dstc8/aaai-20-workshop

言語対の音素事後確率を用いた第二言語学習者の発音習熟度判別

森凜太朗, 李晃伸

電子情報通信学会音声研究会（IEICE-SP） 2019.12

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

個別の発話スタイルを強調する Boosting Framework を用いた感情表現生成

尾関晃英, 李晃伸

情報処理学会自然言語処理研究会（IPSJ-NL） 2019.12

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

話題展開器を導入した外部知識に基づくニューラル対話モデル

田中涼太, 李晃伸

情報処理学会自然言語処理研究会（IPSJ-NL） 2019.12

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Ensemble Dialogue System for Facts-Based Sentence Generation Reviewed International coauthorship International journal

Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee

Proc. AAAI2019 Dialogue System Technology Challenge 7 workshop 2019.01

Language：English Publishing type：Research paper (international conference proceedings)

arXiv

researchmap

Other Link： http://workshop.colips.org/dstc7/workshop.html

Machine Learning and Language Learning: English Conversation Simulator and its Design for Language Learning Reviewed

KIMURA Mitsushige, LEE Akinobu, KAWASHIMA Hiroaki

Journal of The Society of Instrument and Control Engineers 58 ( 11 ) 873 - 877 2019

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Society of Instrument and Control Engineers

CiNii Articles

researchmap

外部事実情報と対話履歴を用いたアンサンブル対話システム

田中涼太, 尾関晃英, 加藤修悟, 李晃伸

SIG-SLUD 2018.11

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

This study aims to avoid "safe response" by conditioning context and external facts
extracted from information websites (e.g. Wikipedia), and then generate the response based on
real-world facts. This system consists of the three sub modules i.e. Ensemble Dialogue System,
where generated-based module, facts retrieval module, and reranking module. Thus, the response
can be determined from various viewpoints by combining multiple systems. The experiments
and evaluations are conducted based on sentence generation task of Dialog System Technology
Challenges 7, and then our system performed significantly better than many competing systems.

researchmap

再帰型ニューラルネットに基づく音素情報を用いた応答選択

牧野健一郎，李晃伸

情報処理学会音声言語情報処理研究会研究報告 2017-SLP-117 ( 4 ) 1 - 6 2017.07

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：情報処理学会

researchmap

日本語におけるG2Pによる統計的学習を用いた話し言葉に頑健な発音辞書の自動構築

寺田卓矢，李晃伸

情報処理学会音声言語情報処理研究会研究報告 2017-SLP-117 ( 11 ) 1 - 6 2017.07

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：情報処理学会

researchmap

国際会議 ICASSP2017 報告

浅見太一, 大谷大和, 岡本拓磨, 小川哲司, 落合翼, 亀岡弘和, 駒谷和範, 高木信二, 高道慎之介, 俵直弘, 南條浩輝, 橋本佳, 福田隆, 増村亮, 松田繁樹, 李晃伸, 渡部晋治

第117回音声言語情報処理研究会 (SIG-SLP) SLP-3 2017.07

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

User generated dialogue systems: uDialogue Reviewed

Keiichi Tokuda, Akinobu Lee, Yoshihiko Nankaku, Keiichiro Oura, Kei Hashimoto, Daisuke Yamamoto, Ichi Takumi, Takahiro Uchiya, Shuhei Tsutsumi, Steve Renals, Junichi Yamagishi

Human-Harmonized Information Technology 2 77 - 114 2017.04

Language：English Publishing type：Part of collection (book) Publisher：Springer Japan

This chapter introduces the idea of user-generated dialogue content and describes our experimental exploration aimed at clarifying the mechanism and conditions that makes it workable in practice. One of the attractive points of a speech interface is to provide a vivid sense of interactivity that cannot be achieved with a text interface alone. This study proposes a framework that spoken dialogue systems are separated into content that can be produced and modified by users, and the systems that drive the content, and seek to clarify (1) the requirements of systems that enable the creation of attractive spoken dialogue, and (2) the conditions for the active generation of attractive dialogue content by users, while attempting to establish a method for realizing them. Experiments for validating user dialogue content generation were performed by installing interactive digital signage with a speech interface in public spaces as a dialogue device, and implementing a content generation environment for users via the Internet. The proposed framework is expected to lead to a breakthrough in the spread of using speech technology.

DOI： 10.1007/978-4-431-56535-2_3

Scopus

researchmap

Investigation of relationship between speaking affordability and expressing common environments and knowledge in spoken dialogue system

78 ( 78 ) 125 - 128 2016.10

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

CiNii Articles

CiNii Books

researchmap

ANALYSIS OF RELATIONSHIP IN PSYCHOLOGICAL CHARACTERISTICS AT SHORT MEETINGS FOR AFFABLE SPOKEN DIALOGUE SYSTEMS

78 ( 78 ) 129 - 134 2016.10

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

CiNii Articles

CiNii Books

researchmap

ユーザフレンドリィな音声対話システム実現のためのユーザ話速および発話内容に基づくシステム話速制御手法の検討

三原寛哉, 李晃伸

研究報告音声言語情報処理（SLP） 2016-SLP-112 ( 15 ) 1 - 6 2016.07

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：情報処理学会

researchmap

音声対話システムのオープンコンテンツ化実現のためのモジュール仕様および管理手法

山西元樹，船谷内泰斗，李晃伸

研究報告音声言語情報処理（SLP） 2016-SLP-112 ( 14 ) 1 - 6 2016.07

Language：Japanese Publishing type：Research paper (conference, symposium, etc.) Publisher：情報処理学会

researchmap

音声対話システムにおけるシステムからの話しかけと他者性認知の関連性の調査

村上拓也, 李晃伸, 西川由里, 小島良広, 遠藤充

HAIシンポジウム2015 238 - 243 2015.12

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

音声対話インタフェースにおけるマルチタスク性の適切な表出方法の検討

小中彩貴, 李晃伸

HAIシンポジウム2015 108 - 112 2015.12

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

音声対話システムにおける音環境への反応表出によるアフォーダンスの評価

夏目　龍司, 李晃伸

HAIシンポジウム2015 94 - 98 2015.12

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

利用者による履歴付き対話の共同構築・拡張が可能なユーザ生成音声対話システム

宮木京介, 飯塚遼, 李晃伸

日本音響学会2015年秋季研究発表会講演論文集 3-Q-22 2015.09

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

単語間非共有ノードに基づく単語信頼度を用いたキーワードの発話中遂次確定

松尾涼平, 小林大晃, 李晃伸

日本音響学会2015年秋季研究発表会講演論文集 3-Q-12 2015.09

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Prosodically-Enhanced Recurrent Neural Network Language Models Reviewed International coauthorship International journal

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2390 - 2394 2015.09

Language：English Publishing type：Research paper (international conference proceedings)

Prosodically-enhanced Recurrent Neural Network Language Models Reviewed

Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5 2390 - 2394 2015

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER.

Web of Science

researchmap

Voice interaction system with 3D-CG virtual agent for stand-alone smartphones Reviewed

Daisuke Yamamoto, Keiichiro Oura, Ryota Nishimura, Takahiro Uchiya, Akinobu Lee, Keiichi Tokuda, Ichi Takumi

HAI 2014 - Proceedings of the 2nd International Conference on Human-Agent Interaction 323 - 330 2014.10

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Association for Computing Machinery, Inc

In this paper, we propose a voice interaction system using 3D-CG virtual agents for stand-alone smartphones. Because the proposed system can handle speech recognition and speech synthesis on a stand-alone smartphone differently from the existing mobile voice interaction systems, this system enables us to talk naturally without encountering delays caused by network communications. Moreover, proposed system can be fully customized by dialogue scripts, Java-based plugins, and Android APIs. Therefore, developers can make original voice interaction systems for smartphones easily based on proposed system. We have made a subset of the proposed system available as opensource software. We expect that this system will contribute to studies of human-agent interaction using smartphones.

DOI： 10.1145/2658861.2658874

Scopus

researchmap

Voice interaction system with 3D-CG virtual agent for stand-alone smartphones Reviewed International journal

Daisuke Yamamoto, Keiichiro Oura, Ryota Nishimura, Takahiro Uchirya, Akinobu Lee, Ichi Takumi, Keiichi Tokuda

the 2nd International Conference on Human Agent Interaction (HAI 2014), ACM digital library, 320 - 330 2014.10

Language：English Publishing type：Research paper (international conference proceedings)

統計的音声対話システムにおける音素系列を用いた頑健な応答選択

佐伯昌幸, 李晃伸

音声言語情報処理研究会SIG-SLP第101回研究会 2014.05

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

ユーザ生成型音声対話システムにおけるクリエイターとユーザの相互刺激によるインセンティブ向上の検討

飯塚遼, 李晃伸

音声言語情報処理研究会SIG-SLP第101回研究会 2014.05

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

条件付き確率場に基づく仮説の逐次早期確定を用いた低遅延音声インタフェース

伊神陽介，李晃伸，徳田恵一，南角吉彦

音響学会公演論文集 63 - 64 2014.03

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap

大語彙連続音声認識における単語信頼度に基づく単語固有ノードの枝刈り手法の検討

小林大晃, 伊藤直晃, 李晃伸

日本音響学会2014年春季研究発表会講演論文集 2014.03

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Response selection based on hypothesis generation that prioritizes neighborhood word of keywords in statistical spoken dialogue system

3-Q5-13 - 224 2014.03

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

CiNii Articles

researchmap

条件付き確立場に基づく仮説の遂次早期確定を用い低遅延音声インタフェース

伊神陽介, 李晃伸, 徳田恵一, 南角吉彦

日本音響学会2014年春季研究発表会講演論文集 2-4-7 2014.03

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

ユーザ生成型音声対話コンテンツに向けた有限状態トランスデューサに基づく簡潔な対話記述法の検討

船谷内泰斗, 大浦圭一郎, 南角吉彦, 李晃伸, 徳田恵一

音響学会講演論文集 223 - 224 2013.09

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap

MMDAGENT - A FULLY OPEN-SOURCE TOOLKIT FOR VOICE INTERACTION SYSTEMS Reviewed International journal

Akinobu Lee, Keiichiro Oura, Keiichi Tokuda

2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP) 8382 - 8385 2013.05

Authorship：Lead author Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper describes development of an open-source toolkit which makes it possible to explore a vast variety of aspects in speech interactions at spoken dialog systems and speech interfaces. The toolkit tightly incorporates recent speech recognition and synthesis technologies with a 3-D CG rendering module that can manipulates expressive embodied agent characters. The software design and its interfaces are carefully designed to be fully open toolkit. Ongoing demonstration experiments to public indicates that it is promoting related researches and developments of voice interaction systems in various scenes.

DOI： 10.1109/ICASSP.2013.6639300

Web of Science

researchmap

スマートフォン単体で動作する音声対話3Dエージェント「スマートメイちゃん」の開発

山本大介, 大浦圭一郎, 李晃伸　他

情報処理学会インタラクション 675 - 680 2013.03

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

ユーザ参加型双方向音声案内デジタルサイネージシステムの開発・設置・運用事例 Invited

徳田恵一, 大浦圭一郎, 李晃伸, 山本大介, 打矢隆弘, 内匠逸

日本音響学会2013年春季研究発表会論文集 119 - 122 2013.03

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap

On-Campus, User-Participatable, and Voice-Interactive Digital Signage(<Special Issue>Practical Issues of Spoken Dialogue Systems) Reviewed

Keiichiro Oura, Daisuke Yamamoto, Ichi Takumi, Akinobu Lee, Keiichi Tokuda

28 ( 1 ) 60 - 67 2013.01

Language：Japanese Publishing type：Research paper (scientific journal)

CiNii Articles

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1004/00008160/

Technical Advances of Speech-Oriented Guidance System "Takemaru-kun" by 10 Years of Long-Term Operation(<Special Issue>Practical Issues of Spoken Dialogue Systems) Reviewed

NISIMURA Ryuichi, HARA Sunao, KAWANAMI Hiromichi, LEE Akinobu, SHIKANO Kiyohiro, Ryuichi Nishimura, Sunao Hara, Hiromichi Kawanami, Akinobu Lee, Kiyohiro Shikano

Journal of the Japanese Society for Artificial Intelligence 28 ( 1 ) 52 - 59 2013.01

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：The Japanese Society for Artificial Intelligence

DOI： 10.11517/jjsai.28.1_52

CiNii Articles

CiNii Books

researchmap

Other Link： http://id.nii.ac.jp/1004/00008159/

ドライバの社会性に関するCharacter自動推定

神沼充伸, 西崎友規子, ブエ・ステファン, 南角吉彦, 李晃伸

Human Interface 2012予稿集 2012.09

Language：Japanese Publishing type：Research paper (other academic)

researchmap

登録キーワードと汎用言語モデルを用いた音声認識部・応答選択部の密結合に基づく統計的音声対話システム

平野隆司, 加藤杏樹, 南角吉彦, 李晃伸, 徳田恵一

2012 Information Processing Society of Japan 2012-SLP-92 ( 3 ) 1 - 6 2012.07

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap

双方向音声デジタルサイネージのための学内イベント登録システム

山本大介, 大浦圭一郎, 李晃伸, 打矢隆弘, 内匠逸, 徳田恵一, 松尾啓志

大学ITC推進協議会2011年度年次大会 2011.12

Language：Japanese Publishing type：Research paper (other academic)

researchmap

魅力ある音声インタラクションシステムを構築するためのオープンソースツールキットMMDAgent

李晃伸, 大浦圭一郎, 徳田恵一

Technical Report of IEICE 1 - 6 2011.12

Language：Japanese Publishing type：Research paper (other academic)

researchmap

Speech recognition based on statistical models including multiple phonetic decision trees Reviewed

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Acoustical Science and Technology 32 ( 6 ) 236 - 243 2011.11

Language：English Publishing type：Research paper (scientific journal)

連続音声認識における仮説の低遅延逐次確定アルゴリズムの評価

大野博之, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2011年秋季研究発表会論文集 45 - 46 2011.09

Language：Japanese Publishing type：Research paper (other academic)

researchmap

Evaluation of Tree-Trellis Based Decoding on Over-Million LVCSR Reviewed

Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. ISCA Interspeech2011 1937 - 1940 2011.08

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Bayesian Context Clustering Using Cross Validation for Speech Recognition Reviewed

Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E94-D ( 3 ) 668 - 678 2011.03

Language：English Publishing type：Research paper (scientific journal) Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.

DOI： 10.1587/transinf.E94.D.668

Web of Science

researchmap

Evaluation of Tree-trellis based Decoding in Over-million LVCSR Reviewed

Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5 1948 - 1951 2011

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

Very large vocabulary continuous speech recognition (CSR) that can recognize every sentence is one of important goals in speech recognition. Several attempts have been made to achieve very large vocabulary CSR. However, very large vocabulary CSR using a tree-trellis based decoder has not been reported. We report the performance evaluation and improvement of the "Julius" tree-trellis based decoder in large vocabulary CSR (LVCSR) involving more than one million vocabulary, referred to here as over-million LVCSR. Experiments indicated that Julius achieved a word accuracy of about 91% and a real time factor of about 2 in over-million LVCSR for Japanese newspaper speech transcription.

Web of Science

researchmap

Speech recognition based on statistical models including multiple phonetic decision trees Reviewed

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Acoustical Science and Technology 32 ( 6 ) 236 - 243 2011

Language：English Publishing type：Research paper (scientific journal)

We propose a speech recognition technique using multiple model structures. In the use of context-dependent models, decision-tree-based context clustering is applied to find an appropriate parameter tying structure. However, context clustering is usually performed on the basis of unreliable statistics of hidden Markov model (HMM) state sequences because the estimation of reliable state sequences requires an appropriate model structures, that cannot be obtained prior to context clustering. Therefore, context clustering and the estimation of state sequences essentially cannot be performed independently. To overcome this problem, we propose an optimization technique of state sequences based on an annealing process using multiple decision trees. In this technique, a new likelihood function is defined in order to treat multiple model structures, and the deterministic annealing expectation maximization algorithm is used as the training algorithm. Experimental continuous phoneme recognition results show that the proposed method of using only two decision trees achieved about an 11.1% relative error reduction over the conventional method. © 2011 The Acoustical Society of Japan.

DOI： 10.1250/ast.32.236

Scopus

researchmap

Voice activity detection based on conditional random fields using multiple features（共著） Reviewed

Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 2086 - 2089 2010.09

Language：English Publishing type：Research paper (international conference proceedings)

Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition（共著） Reviewed

Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 542 - 545 2010.09

Language：English Publishing type：Research paper (international conference proceedings)

A Covariance-Tying Technique for HMM-Based Speech Synthesis Reviewed

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS 93 ( 3 ) 595 - 601 2010.03

Language：English Publishing type：Research paper (scientific journal) Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

DOI： 10.1587/transinf.E93.D.595

Web of Science

researchmap

音声認識のデコーダと認識エンジン Reviewed

李晃伸

日本音響学会誌日本音響学会 66 ( 1 ) 28 - 31 2010.01

Language：English Publishing type：Research paper (scientific journal)

researchmap

Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition Reviewed

Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2 542 - 545 2010

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

This paper proposes a speaker adaptation technique using a nonlinear spectral transform based on GMMs. One of the most popular forms of speaker adaptation is based on linear transforms, e.g., MLLR. Although MLLR uses multiple transforms according to regression classes, only a single linear transform is applied to each state. The proposed method performs nonlinear speaker adaptation based on a new likelihood function combining HMMs for recognition with GMMs for spectral transform. Moreover, the dependency of transforms on context can also be estimated in an integrated ML fashion. The proposed technique outperformed conventional approaches in phoneme-recognition experiments.

Web of Science

researchmap

Voice activity detection based on conditional random fields using multiple features Reviewed

Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4 2086 - 2089 2010

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

This paper proposes a Voice Activity Detection (VAD) algorithm based on Conditional Random Fields (CRF) using multiple features. VAD is a technique used to distinguish between speech and non-speech in noisy environments and is an important component in many real-world speech applications. The posterior probability of output labels in the proposed method is directly modeled by the weighted sum of the feature functions. Effective features are automatically selected by estimating appropriate weight parameters to improve the accuracy of VAD. Experimental results on the CENSREC-1-C database revealed that the proposed approach can decrease error rates by using CRF.

Web of Science

researchmap

Computational Reduction of Contenous Speech Recognition Software "Julius" on SuperH Microprocessor Reviewed

50 ( 11 ) 2597 - 2606 2009.11

Language：Japanese Publishing type：Research paper (scientific journal)

CiNii Articles

CiNii Books

researchmap

Development of a Toolkit for Spoken Dialog System with an Anthoropomorphic Agent: Galatea Reviewed

Kouichi Katsurada, Akinobu Lee, Tatsuya Kawahara, Tatsuo Yotsukura, Shigeo Morishima, Takuya Nishimoto, Yoichi Yamashita, and Tsuneo Nitta

Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 148 - 153 2009.10

Language：English Publishing type：Research paper (other academic)

researchmap

Recent Development of Open-Source Speech Recognition Engine Julius Reviewed

Akinobu Lee and Tatsuya Kawahara

Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA) 131 - 137 2009.10

Language：English Publishing type：Research paper (other academic)

researchmap

Tying Covariance Matrices to Reduce the Footprint of HMM-based Speech Synthesis Systems Reviewed

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

Proc. Conference of the International Speech Communiation Association (INTERSPEECH) 1759 - 1762 2009.09

Language：English Publishing type：Research paper (other academic)

総合報告ユーザ負担のない話者・環境適応性を実現する自然な音声対話処理技術の総合開発

鹿野清宏, 武田一哉, 河原達也, 河原英紀, 猿渡洋, 徳田恵一, 李晃伸, 川波弘道, 西村竜一, Randy GOMEZ, 戸田智基, 西浦敬信, 高橋徹, 坂野秀樹, 全炳河

電子情報通信学会誌 92 ( 6 ) 2009.06

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap

Voice Conversion based on Simultaneous Modeling of Spectrum and F0 Reviewed

Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

Proc. IEEE International Conference on Acoustics, Speech and Signal Processing 3897 - 3900 2009.04

Language：English Publishing type：Research paper (other academic)

Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems Reviewed

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5 1723 - 1726 2009

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

This paper proposes a technique of reducing footprint of HMM-based speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.

Web of Science

researchmap

VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0 Reviewed

Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS 3897 - 3900 2009

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper proposes a simultaneous modeling of spectrum and F(0) for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F(0) sequences are usually converted by a simple linear function. This is because F(0) is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F(0) values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F(0) sequences.

Web of Science

researchmap

Speaker recognition based on Gaussian mixture models using variational Bayesian method

Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

電子情報通信学会技術研究報告 108 ( 338 ) 185 - 190 2008.12

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap

Speech recognition based on statistical models including multiple decision trees

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

電子情報通信学会技術研究報告 108 ( 338 ) 221 - 226 2008.12

Language：English Publishing type：Research paper (conference, symposium, etc.)

researchmap

A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System Reviewed

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS E91D ( 11 ) 2693 - 2700 2008.11

Language：English Publishing type：Research paper (scientific journal) Publisher：IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG

In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.

DOI： 10.1093/ietisy/e91-d.11.2693

Web of Science

researchmap

Acoustic modeling based on model structure annealing for speech recognition Reviewed

Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of Interspeech 2008 932 - 935 2008.09

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

複数の音素決定木を用いた音声認識の検討

塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2008年秋季研究発表会講演論文集 125 - 126 2008.09

Language：Japanese Publishing type：Research paper (other academic)

researchmap

Speaker recognition based on variational Bayesian method Reviewed

Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of Interspeech 2008 1417 - 1420 2008.09

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition Reviewed

Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of Interspeech 2008 936 - 939 2008.09

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

クロスバリデーションを用いたベイズ基準によるコンテキストクラスタリング

橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2008年春季研究発表会講演論文集 69 - 70 2008.03

Language：Japanese Publishing type：Research paper (other academic)

researchmap

変分ベイズ法に基づく話者認識

伊藤達也, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2008年春季研究発表会講演論文集 143 - 144 2008.03

Language：Japanese Publishing type：Research paper (other academic)

researchmap

Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System. Reviewed

Tobias Cincarek, Hiromichi Kawanami, Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

IEICE Transactions 91-D ( 3 ) 576 - 587 2008

DOI： 10.1093/ietisy/e91-d.3.576

researchmap

Probabilistic Answer Selection Based on Conditional Random Fields for Spoken Dialog System Reviewed

Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5 215 - 218 2008

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

A probabilistic answer selection for a spoken dialog system based on Conditional Random Fields (CRFs) is described. The probabilities of answers for a question is trained by CRFs based on the lexical and morphological properties of each word, the most likely answer against the recognized word sequence of question utterance will be chosen as the system output. Various set of feature functions were evaluated on the real data of a speech oriented information kiosk system, and it is shown that the morphological properties introduces positive effects on the response accuracy. Training with recognizer output of training database instead of manual transcription was also investigated. It was also shown that this proposed scheme can achieve higher accuracy than a conventional keyword-based answer selection.

Web of Science

researchmap

変分ベイズ法に基づく音声認識のためのハイパーパラメータの共有構造

橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2007年秋季研究発表会講演論文集 139 - 142 2007.09

Language：Japanese Publishing type：Research paper (other academic)

researchmap

音声認識のための音素決定木構造のアニーリングに基づく音響モデリング

塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

日本音響学会2007年秋季研究発表会講演論文集 143 - 146 2007.09

Language：Japanese Publishing type：Research paper (other academic)

researchmap

音素決定木構造のアニーリングに基づく音響モデリング

塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

電子情報通信学会技術研究報告 107 ( 165 ) 67 - 72 2007.07

Language：Japanese Publishing type：Research paper (conference, symposium, etc.)

researchmap

Speech Recognition Techniques for Real-World Robot Application

LEE Akinobu, NISHIMURA Ryuichi

Journal of The Society of Instrument and Control Engineers 46 ( 6 ) 441 - 446 2007.06

Language：Japanese Publisher：The Society of Instrument and Control Engineers

DOI： 10.11499/sicejl1962.46.441

CiNii Articles

CiNii Books

researchmap

Other Link： https://jlc.jst.go.jp/DN/JALC/00295524175?from=CiNii

Insights gained from development and long-term operation of a real-environment speech-oriented guidance system Reviewed

Tobias Cincarek, Ryuichi Nisimura, Akinobu Lee, Kiyohiro Shikano

2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3 157 - + 2007

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

This paper presents insights gained from operating a public speech-oriented guidance system. A real-environment speech database (300 hours) collected with the system over four years is described and analyzed regarding usage frequency, content and diversity. Having the first two years of the data completely transcribed, simulation of system development and evaluation of system performance over time is possible. The database is employed for acoustic and language modeling as well as construction of a question and answer database. Since the system input is not text but speech, the database enables also research on open-domain speech-based information access. Apart from that research on unsupervised acoustic modeling, language modeling and system portability can be carried out. A performance evaluation of the system in an early stage as well as late stage when using two years of real-environment data for constructing all system components shows the relative importance of developing each system component. The system's response accuracy is 83% for adults and 68% for children.

Web of Science

researchmap

Real-time continuous speech recognition system on SH-4A microprocessor Reviewed

Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING 35 - + 2007

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of Julius (embedded Julius). Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. In this paper, we describe an implementation of the embedded Julius on a SH-4A microprocessor. SH-4A is a high-end 32-bit MPU (720MIPS) with on-chip FPU. However, further computational reduction is necessary for the embedded Julius to operate real-time. Applying some optimizations, the embedded Julius achieves real-time processing on the SH-4A. The experimental results show 0.89 x RT(real-time), resulting 4.0 times faster than baseline CSR. We also evaluated the embedded Julius on large vocabulary (20,000 words). It shows almost real-time processing (1.25 x RT).

Web of Science

researchmap

Hyperparameter estimation for speech recognition based on variational Bayesian approach

Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

Proceedings of ASA & ASJ Joint Meeting 3042 - 3042 2006.11

Language：English Publishing type：Research paper (international conference proceedings)

researchmap

実環境における子供音声認識のための音韻モデルおよび教師なし話者適応の評価 Reviewed

鮫島充, Randy Gomez, 李晃伸, 猿渡洋, 鹿野清宏

情報処理学会論文誌 47 ( 7 ) 2295 - 2304 2006.07

Language：Japanese Publishing type：Research paper (international conference proceedings)

researchmap

An HMM-based Singing Voice Synthesis System Reviewed

Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2274 - 2277 2006

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

The present paper describes a corpus-based singing voice synthesis system based on hidden Markov models (HMMs). This system employs the HMM-based speech synthesis to synthesize singing voice. Musical information such as lyrics, tones, durations is modeled simultaneously in a unified framework of the context-dependent HMM. It can mimic the voice quality and singing style of the original singer. Results of a singing voice synthesis experiment show that the proposed system can synthesize smooth and natural-sounding singing voice.

Web of Science

researchmap

Voice Conversion Based on Mixtures of Factor Analyzers Reviewed

Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 2278 - + 2006

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INT SPEECH COMMUNICATION ASSOC

This paper describes the voice conversion based on the Mixtures of Factor Analyzers (MFA) which can provide an efficient modeling with a limited amount of training data. As a typical spectral conversion method, a mapping algorithm based on the Gaussian Mixture Model (GMM) has been proposed. In this method two kinds of covariance matrix structures are often used : the diagonal and full covariance matrices. GMM with diagonal covariance matrices requires a large number of mixture components for accurately estimating spectral features. On the other hand, GMM with full covariance matrices needs sufficient training data to estimate model parameters. In order to cope with these problems, we apply MFA to voice conversion. MFA can be regarded as intermediate model between GMM with diagonal covariance and with full covariance. Experimental results show that MFA can improve the conversion accuracy compared with the conventional GMM.

Web of Science

researchmap

Reducing Computation on Parallel Decoding using Frame-wise Confidence Scores Reviewed

Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda

INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5 1638 - 1641 2006

Language：English Publishing type：Research paper (international conference proceedings) Publisher：ISCA-INST SPEECH COMMUNICATION ASSOC

Parallel decoding based on multiple models has been studied to cover various conditions and speakers at a time on a speech recognition system. However, running many recognizers in parallel applying all models causes the total computational cost to grow in proportion to the number of models. In this paper, an efficient way of finding and pruning unpromising decoding processes during search is proposed. By comparing temporal search statistics at each frame among all decoders, decoders with relatively unmatched model can be pruned in the middle of recognition process to save computational cost. This method allows the model structures to be mutually independent. Two frame-wise pruning measures based on maximum hypothesis likelihoods and top confidence scores respectively, and their combinations are investigated. Experimental results on parallel recognition of seven acoustic models showed that by using the both criteria, the total computational cost was reduced to 36.53% compared to full computation without degrading the recognition accuracy.

Web of Science

researchmap

Hidden semi-Markov model based speech recognition system using weighted finite-state transducer Reviewed

Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13 33 - 36 2006

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

In hidden Markov models (HNMs), state duration probabilities decrease exponentially with time. It would be inappropriate representation of temporal structure of speech. One of the solutions for this problem is integrating state duration probability distributions explicitly into the BNM. This form is known as a hidden semi-Markov model (HSMM) [1]. Although a number of attempts to use explicit duration models in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding.
In the present paper, a fully consistent speech recognition system based on the HSMM framework is proposed. In a speaker-dependent continuous speech recognition experiment, HSNM-based speech recognition system achieved about 5.9% relative error reduction over the corresponding HMM-based one.

Web of Science

researchmap

Embedded Julius: Continuous speech recognition software for microprocessor Reviewed

Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING 378 - + 2006

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of "Julius". Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. Julius works as a real time decoder on a PC. However further computational reduction is necessary to use Julius on a microprocessor. Further cost reduction is needed. For reducing cost of calculating pdfs (probability density function), Julius adopts a GMS (Gaussian Mixture Selection) method. In this paper, we modify the GMS method to realize a continuous speech recognizer on microprocessors. This approach does not change the structure of acoustic models in consistency with that used by conventional Julius, and enables developers to use acoustic models developed by popular modeling tools. On simulation, the proposed method has archived 20% reduction of computational costs compared to conventional GMS, 40% reduction compared to no GMS. Finally, the embedded version of Julius was tested on a developmental hardware platform named "T-engine". The proposed method showed 2.23 of RTF (Real Time Factor) resulting 79% of that of no GMS without any degradation of recognition performance.

Web of Science

researchmap

Embedded julius on T-Engine platform Reviewed

Nobuo Hataoka, Hiroaki Kokubo, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2 37 - + 2006

Language：English Publishing type：Research paper (international conference proceedings) Publisher：IEEE

In this paper, we report implemental results of an embedded version of Julius. We used T-Engine (TM) as a hardware platform which has a SuperH microprocessor. The Julius is free and open Continuous Speech Recognition (CSR) software running on Personal Computers (PCs) which have huge CPU power and storage memory size. The technical problems to make Julius for embedded version are computing/process and memory reductions of Julius software. We realized 2.23 of RTF (Real Time Factor) of embedded speech recognition processing on the condition of 5000-word vocabulary without any recognition accuracy degradation.

Web of Science

researchmap

Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. Reviewed

Shinichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yotsukura, Atsuhiko Kai, Akinobu Lee, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsushi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama

Life-like characters - tools, affective functions, and applications. 187 - 212 2004

Publisher：Springer

researchmap

Recent progress of open-source LVCSR engine Julius and Japanese model repository - Software of continuous speech recognition consortium

Tatsuya Kawahara, Akinobu Lee, Kazuya Takeda, Katsunobu Itou, Kiyohiro Shikano

8th International Conference on Spoken Language Processing, ICSLP 2004 3069 - 3072 2004

Publishing type：Research paper (international conference proceedings)

Continuous Speech Recognition Consortium (CSRC) was founded for further enhancement of Japanese Dictation Toolkit that had been developed by the support of a Japanese agency. Overview of its product software is reported in this paper. The open-source LVCSR (large vocabulary continuous speech recognition) engine Julius has been improved both in performance and functionality, and it is also ported to Microsoft Windows in compliance with SAPI (Speech API). The software is now used for not a few languages and plenty of applications. For plug-and-play speech recognition in various applications, we have also compiled a repository of acoustic and language models for Japanese. Especially, the set of acoustic models realizes wider coverage of user generations and speech-input environments.

Scopus

researchmap

Real-time word confidence scoring using local posterior probabilities on tree trellis search

Akinobu Lee, Kiyohiro Shikano, Tatsuya Kawahara

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 I793 - I796 2004

Language：English Publishing type：Research paper (international conference proceedings)

Confidence scoring based on word posterior probability is usually performed as a post process of speech recognition decoding, and also needs a large number of word hypotheses to get enough confidence quality. We propose a simple way of computing the word confidence using estimated posterior probability while decoding. At the word expansion of stack decoding search, the local sentence likelihoods that contains heuristic scores of unreached segment are directly used to compute the posterior probabilities. Experimental result showed that, although the likelihoods are not optimal, it can provide slightly better confidence measures compared with N-best lists, while the computation is faster than 100-best method because no N-best decoding is required.

Scopus

researchmap

Development of Anthropomorphic Spoken Dialogue Agent Toolkit

Sagayama,Shigeki, Itou,Katsunobu, Utsuro,Takehito, Kai,Atsuhiko, Kobayashi,Takao, Shimodaira,Hiroshi, Den,Yasuharu, Tokuda,Keiichi, Nakamura,Satoshi, N9ishimoto,Takuya, Nitta,Tsuneo, Hirose,Keikichi, Minematsu,Nobuaki, Morishima,Shigeo, Yamashita,Yoichi, Yamada,Atsushi, Lee,Akinobu

IPSJ SIG Notes 2003 ( 124 ) 319 - 324 2003.12

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：一般社団法人電子情報通信学会

researchmap

Galatea : An Anthropomorphic Spoken Dialogue Agent Toolkit

Sagayama,Shigeki, Kawamoto,Shin-ichi, Shimodaira,Hiroshi, Nitta,Tsuneo, Nishimoto,Takuya, Nakamura,Satoshi, Itou,Katsunobu, Morishima,Shigeo, Yotsukura,Tatsuo, Kai,Atsuhiko, Lee,Akinobu, Yamashita,Yoichi, Kobayashi,Takao, Tokuda,Keiichi, Hirose,Keikichi, Minematsu,Nobuaki, Yamada,Atsushi, Den,Yasuharu, Utsuro,Takehito

IPSJ SIG Notes 2003 ( 14 ) 57 - 64 2003.02

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Information Processing Society of Japan (IPSJ)

researchmap

Complemental Back-off Algorithm for Merging Language Models

NAGATOMO KENTARO, NISIMURA RYUICHI, KOMATSU KUMIKO, KURODA YUKA, LEE AKINOBU, SARUWATARI HIROSHI, SHIKANO KIYOHIRO

IPSJ Journal 43 ( 9 ) 2884 - 2893 2002.09

Language：Japanese Publisher：Information Processing Society of Japan (IPSJ)

A new complemental back-off algorithm for merging two N-gram language models is proposed. By merging several topic-dependent or style-dependent models, we can construct a general model that covers wider range of topics easily. However, a conventional method that simply concatenates the training corpora or interpolating each probabilities often levels off the task-dependent characteristics in each language models, and weaken the linguistic constraint in total. We propose a new back-off scheme that assigns the unseen N-gram probabilities according to the probabilities of the another model. It can assign more reliable probabilities to the unseen N-grams, and no original corpora is needed for the merging. We implemented a command tool that realizes this method, and evaluated it on three recognition tasks (medical consulting, food recipe query and newspaper article). The results reveal that our merged model can keep the same accuracy of each original one.

CiNii Articles

researchmap

Design of Software Toolkit for Anthropomorphic Spoken Dialog Agent Software with Customization-Oriented Features Reviewed

Shin-ichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yokura, Atuhiko Kai, Akinobu Li, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsuhi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama

Transactions of Information Processing Society of Japan 43 ( 7 ) 2249-2264 2002.05

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：Information Processing Society of Japan

researchmap

Project for Development of Anthropomorphic Spoken-Dialog Agent

SAGAYAMA,Shigeki, ITOU,Katsunobu, UTSURO,Takehito, KAI,Atsuhiko, KOBAYASHI,Takao, SHIMODAIRA,Hiroshi, DEN,Yasuharu, TOKUDA,Keiichi, NAKAMURA,Satoshi, NISHIMOTO,Takuya, NITTA,Tsuneo, HIROSE,Keikichi, MORISHIMA,Shigeo, MINEMATSU,Nobuaki, YAMASHITA,Yoichi, YAMADA,Atsushi, LEE,Akinobu

日本音響学会研究発表会講演論文集 2002 ( 1 ) 27 - 28 2002.03

Language：Japanese Publishing type：Research paper (scientific journal)

researchmap

A Design of Anthropomorphic Spoken Dialog Agent Toolkit

Kawamoto,Shin-ichi, Shimodaira,Hiroshi, Nitta,Tsuneo, Nishimoto,Takuya, Nakamura,Satoshi, Itou,Katsunobu, Morishima,Shigeo, Yotsukura,Tatsuo, Kai,Atsuhiko, Lee,Akinobu, Yamashita,Yoichi, Kobayashi,Takao, Tokuda,Keiichi, Hirose,Keikichi, Minematsu,Nobuaki, Yamada,Atsushi, Den,Yasuharu, Utsuro,Takehito, Sagayama,Shigeki

情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告 2002 ( 10 ) 61 - 66 2002.02

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：一般社団法人情報処理学会

This paper discusses a design and architecture of a software toolkit to develop an anthropomorphic spoken dialog agent (ASDA) that is easy to customize. Such human-like voice dialogue agent is one of the promising man-machine interface for next generations. To develop such a software toolkit, this paper firstly discusses the basic requirements that ASDA system should have, and then designs the software modules of the systems to fulfill the requirements. A prototype agent system has been developed on the UNIX-base systems by using the software toolkit that is under development. Discussions of the current achievement of the toolkit that will become publicly available as a free software are given finally.

researchmap

Japanese Dictation Toolkit --- 1999 version --- Reviewed

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

The Journal of the Acoustical Society of Japan 57 ( 3 ) 210-214 - 214 2001.03

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：日本音響学会

DOI： 10.20697/jasj.57.3_210

researchmap

Julius-An open source real-Time large vocabulary recognition engine

Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology 1691 - 1694 2001

Language：English Publishing type：Research paper (international conference proceedings) Publisher：International Speech Communication Association

Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost realtime decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tiedmixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.

Scopus

researchmap

Gaussian mixture selection using context-independent HMM Reviewed

Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 69 - 72 2001

Language：English Publishing type：Research paper (scientific journal)

We address a method to efficiently select Gaussian mixtures for fast acoustic likelihood computation. It makes use of context-independent models for selection and back-off of corresponding triphone models. Specifically, for the k-best phone models by the preliminary evaluation, triphone models of higher resolution are applied, and others are assigned likelihoods with the monophone models. This selection scheme assigns more reliable back-off likelihoods to the un-selected states than the conventional Gaussian selection based on a VQ codebook. It can also incorporate efficient Gaussian pruning at the preliminary evaluation, which offsets the increased size of the pre-selection model. Experimental results show that the proposed method achieves comparable performance as the standard Gaussian selection, and performs much better under aggressive pruning condition. Together with the phonetic tied-mixture (PTM) modeling, acoustic matching cost is reduced to almost 14% with little loss of accuracy.

DOI： 10.1109/ICASSP.2001.940769

Scopus

researchmap

Large Vocabulary Continuous Speech Recognition using Multi-Pass Search Algorithm Reviewed

Akinobu Lee

2000.09

Language：English Publishing type：Doctoral thesis

Japanese Dictation Toolkit --- 1998 version --- Reviewed

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

The Journal of the Acoustical Society of Japan 56 ( 4 ) 255-259 - 259 2000.04

Language：Japanese Publishing type：Research paper (scientific journal) Publisher：日本音響学会

researchmap

Free software toolkit for Japanese large vocabulary continuous speech recognition. Reviewed

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000 476 - 479 2000

Publisher：ISCA

researchmap

Other Link： http://dblp.uni-trier.de/db/conf/interspeech/interspeech2000.html#conf/interspeech/KawaharaLKTMSIIYYUS00

A new phonetic tied-mixture model for efficient decoding Reviewed

Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Kiyohiro Shikano

ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 3 1269 - 1272 2000

Language：English Publishing type：Research paper (international conference proceedings) Publisher：Institute of Electrical and Electronics Engineers Inc.

A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% with a 20000-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.

DOI： 10.1109/ICASSP.2000.861808

Scopus

researchmap

Japanese Dictation Toolkit --- 1997 version --- Reviewed

Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

The Journal of the Acoustical Society of Japan 55 ( 3 ) 175-180 - 180 1999.03

Language：English Publishing type：Research paper (scientific journal) Publisher：日本音響学会

DOI： 10.20697/jasj.55.3_175

researchmap

PREV - NEXT

To the head of this page.▲

<LEE Akinobu>

Papers - LEE Akinobu