Papers - LEE Akinobu

Division display >> /  All the affair displays  1 - 135 of about 135
  • Data generation for speaker diarization by speaker transition information Reviewed

    Keigo Ichikawa, Sei Ueno, and Akinobu Lee

    Asia Pacific Signal and Information Processing Association (APSIPA)   2024.12

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

    Other Link: https://www.apsipa2023.org/tprogram.html

  • 大規模事前学習モデルによる笑い声表現を用いたspeech-laugh音声の生成

    木全亮太朗, 上乃 聖, 李 晃伸

    日本音響学会講演論文集   2024.09

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Refining Synthesized Speech Using Speaker Information and Phone Masking for Data Augmentation of Speech Recognition Reviewed

    Sei Ueno, Akinobu Lee, Tastuya Kawahara

    IEEE/ACM Transactions on Audio, Speech, and Language Processing   32   3924 - 3933   2024.09

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    DOI: 10.1109/TASLP.2024.3451982

    researchmap

    Other Link: https://repository.kulib.kyoto-u.ac.jp/dspace/handle/2433/289487

  • Multi-setting acoustic feature training for data augmentation of speech recognition Reviewed

    Sei Ueno, Akinobu Lee

    Acoustical Science and Technology   45 ( 4 )   195 - 203   2024.07

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (scientific journal)  

    DOI: https://doi.org/10.1250/ast.e23.70

    researchmap

    Other Link: https://www.jstage.jst.go.jp/article/ast/45/4/45_e23.70/_article/-char/ja

  • 経験情報収集および伝達を主目的とする雑談対話による関係性維持支援システム

    志満津 奈央, 上乃 聖, 李 晃伸

    言語処理学会 第30回年次大会 発表論文集   1394 - 1399   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

    Other Link: https://www.anlp.jp/proceedings/annual_meeting/2024/index.html

  • 大規模言語モデルを用いたEmotional Support Conversation システムの構築とその評価

    藤田 敦也, 上乃 聖, 李 晃伸

    言語処理学会 第30回年次大会 発表論文集   1378 - 1383   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

    Other Link: https://www.anlp.jp/proceedings/annual_meeting/2024/index.html

  • センチメント分析を用いた感情を重視した物語の階層的要約手法

    酒井 健壱, 上乃 聖, 李 晃伸

    言語処理学会 第30回年次大会 発表論文集   1119 - 1124   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

    Other Link: https://www.anlp.jp/proceedings/annual_meeting/2024/index.html

  • 3 話者以上の話者交替情報を用いたSpeaker Diarization のためのデータ生成

    市川 奎吾, 上乃 聖, 李 晃伸

    日本音響学会講演論文集   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 日本語日常会話の潜在的な発話スタイルに基づく対話シーンに応じた音声合成

    嶋崎 純一, 上乃 聖, 李 晃伸

    日本音響学会講演論文集   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 暗黙的な非線形処理を導入した拡散モデルを用いた音声合成

    岡本 海, 上乃 聖, 李 晃伸

    日本音響学会講演論文集   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • LLM によるテキスト生成を用いた音声合成による音声認識のドメイン適応

    上乃 聖, 李 晃伸

    日本音響学会講演論文集   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Synthesis of non-native voice with native-like accent using voice conversion

    Iago Lourenço Correa, Sei Ueno, and Akinobu Lee

    2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

    Other Link: https://acoustics.jp/annualmeeting/program/

  • CG エージェントを用いた音声対話システムにおける空間共有感のための自己投影法

    東 省吾, 上乃 聖, 李 晃伸

    HAIシンポジウム   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • カウンセリングにおける悩み構造の言語化を支援する音声対話システム

    鈴木 香保, 上乃 聖, 李 晃伸

    HAIシンポジウム   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 豊かなノンバーバルコミュニケーションのためのHMDを用いた没入型音声対話システム

    宮下 陸, 上乃 聖, 李 晃伸

    HAIシンポジウム   2024.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Accent-Preserving Voice Conversion between Native-Nonnative Speakers for Second Language Learning Reviewed

    Iago Lourenço Correa, Sei Ueno, and Akinobu Lee

    Asia Pacific Signal and Information Processing Association (APSIPA)   2023.11

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

    Other Link: https://www.apsipa2023.org/tprogram.html

  • Collection of Voice Control Utterances During Driving Using Dialogue System with Question-Answering Database and Large Language Mode

    2023.10

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 時間軸-周波数軸上の幅を持たせる音声合成を用いた音声認識のデータ拡張

    上乃聖, 李晃伸

    日本音響学会講演論文集   2023.09

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 異なるスタイルの笑い声生成のための有声音・無声音間隔の制御

    木全 亮太朗, 上乃 聖, 李晃伸

    日本音響学会講演論文集   2023.09

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 発話交替頻度を導入した実対話音声に対するSpeaker Diarizationのためのデータ生成

    市川 奎吾, 上乃 聖, 李晃伸

    日本音響学会講演論文集   2023.09

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • CGアバター対話における音声からの頭部動作および表情の自動生成

    藤岡 侑貴, 上乃 聖, 李晃伸

    人工知能学会全国大会   2023.06

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 複数設定のスペクトログラムを用いた音声合成に基づく音声認識のデータ拡張

    上乃聖, 李晃伸

    日本音響学会講演論文集   2023.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Continuous Integrate-and-Fire を用いた音声区間検出とターン終了検知のマルチタスク学習

    池口 弘尚, 東 佑樹, 上乃 聖,李 晃伸

    日本音響学会講演論文集   2023.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 連続的な感情表出を用いたカウンセリング対話エージェントの評価

    川又 朱莉, 上乃 聖, 李 晃伸

    HAIシンポジウム   2023.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 多様な笑い声生成のための有声音・無声音間隔の制御

    木全亮太朗, 上乃 聖, 李 晃伸

    情報処理学会全国大会講演論文集   2023.03

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 自律・遠隔融合対話システムのための高生命感・高存在感CGエージェントの開発

    李晃伸, 石黒浩

    第96回 人工知能学会 言語・音声理解と対話処理研究会(第13回対話システムシンポジウム)   2022.12

     More details

    Authorship:Lead author, Corresponding author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 自己投影アバターによる引き込みを用いた2D-CG音声対話システム

    東省吾, 李晃伸

    ヒューマンインタフェースシンポジウム2022   2T-P2   2022.09

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 音声認識のデータ拡張のための話者情報およびマスクを用いた合成音声の周波数スペクトログラム強調

    上乃 聖,李 晃伸,河原 達也

    日本音響学会講演論文集   1149 - 1150   2022.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 自動音声対話におけるネガティブ感情認識のための転移学習の性能比較

    高井幸輝, 李晃伸, 戸田隆道, 東佑樹, 下山翔

    人工知能学会 言語・音声理解と対話処理研究会(SLUD)第93回研究会   2021.11

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 自動音声応答におけるユーザー沈黙時の発話誘導

    西山達也, 李晃伸, 戸田隆道, 友松祐太, 杉山雅和

    人工知能学会 言語・音声理解と対話処理研究会(SLUD)第90回研究会   2020.11

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Context and Knowledge Aware Dialogue System and System Combination for Grounded Response Generation Reviewed International coauthorship International journal

    Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee

    Computer Speech & Language   62   2020.07

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:Elsevier  

    DOI: 10.1016/j.csl.2020.101070

    researchmap

    Other Link: https://www.sciencedirect.com/science/article/pii/S0885230820300036

  • Fact-based Dialogue Generation with Convergent and Divergent Decoding International journal

    Ryota Tanaka, Akinobu Lee

    arXiv   2020.05

     More details

    Language:English   Publishing type:Research paper (other academic)  

    Fact-based dialogue generation is a task of generating a human-like response based on both dialogue context and factual texts. Various methods were proposed to focus on generating informative words that contain facts effectively. However, previous works implicitly assume a topic to be kept on a dialogue and usually converse passively, therefore the systems have a difficulty to generate diverse responses that provide meaningful information proactively. This paper proposes an end-to-end fact-based dialogue system augmented with the ability of convergent and divergent thinking over both context and facts, which can converse about the current topic or introduce a new topic. Specifically, our model incorporates a novel convergent and divergent decoding that can generate informative and diverse responses considering not only given inputs (context and facts) but also inputs-related topics. Both automatic and human evaluation results on DSTC7 dataset show that our model significantly outperforms state-of-the-art baselines, indicating that our model can generate more appropriate, informative, and diverse responses.

    arXiv

    researchmap

  • Speaker-Aware BERT for Multi-Party Dialog Response Selection Reviewed International coauthorship International journal

    Tatsuya Nishiyama, Ryota Tanaka, Yuya Ishijima, Akinobu Lee

    Proc. AAAI2020 Dialogue System Technology Challenge 8 workshop   2020.02

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

    Other Link: https://sites.google.com/dstc.community/dstc8/aaai-20-workshop

  • 言語対の音素事後確率を用いた第二言語学習者の発音習熟度判別

    森凜太朗, 李晃伸

    電子情報通信学会 音声研究会(IEICE-SP)   2019.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 個別の発話スタイルを強調する Boosting Framework を用いた感情表現生成

    尾関晃英, 李晃伸

    情報処理学会 自然言語処理研究会(IPSJ-NL)   2019.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 話題展開器を導入した外部知識に基づくニューラル対話モデル

    田中涼太, 李晃伸

    情報処理学会 自然言語処理研究会(IPSJ-NL)   2019.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Ensemble Dialogue System for Facts-Based Sentence Generation Reviewed International coauthorship International journal

    Ryota Tanaka, Akihide Ozeki, Shugo Kato, Akinobu Lee

    Proc. AAAI2019 Dialogue System Technology Challenge 7 workshop   2019.01

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    arXiv

    researchmap

    Other Link: http://workshop.colips.org/dstc7/workshop.html

  • Machine Learning and Language Learning: English Conversation Simulator and its Design for Language Learning Reviewed

    KIMURA Mitsushige, LEE Akinobu, KAWASHIMA Hiroaki

    Journal of The Society of Instrument and Control Engineers   58 ( 11 )   873 - 877   2019

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:The Society of Instrument and Control Engineers  

    CiNii Articles

    researchmap

  • 外部事実情報と対話履歴を用いたアンサンブル対話システム

    田中涼太, 尾関晃英, 加藤修悟, 李晃伸

    SIG-SLUD   2018.11

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    This study aims to avoid "safe response" by conditioning context and external facts
    extracted from information websites (e.g. Wikipedia), and then generate the response based on
    real-world facts. This system consists of the three sub modules i.e. Ensemble Dialogue System,
    where generated-based module, facts retrieval module, and reranking module. Thus, the response
    can be determined from various viewpoints by combining multiple systems. The experiments
    and evaluations are conducted based on sentence generation task of Dialog System Technology
    Challenges 7, and then our system performed significantly better than many competing systems.

    researchmap

  • 再帰型ニューラルネットに基づく音素情報を用いた応答選択

    牧野 健一郎,李 晃伸

    情報処理学会 音声言語情報処理研究会研究報告   2017-SLP-117 ( 4 )   1 - 6   2017.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:情報処理学会  

    researchmap

  • 日本語におけるG2Pによる統計的学習を用いた話し言葉に頑健な発音辞書の自動構築

    寺田 卓矢,李 晃伸

    情報処理学会 音声言語情報処理研究会研究報告   2017-SLP-117 ( 11 )   1 - 6   2017.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:情報処理学会  

    researchmap

  • 国際会議 ICASSP2017 報告

    浅見 太一, 大谷 大和, 岡本 拓磨, 小川 哲司, 落合 翼, 亀岡 弘和, 駒谷 和範, 高木 信二, 高道 慎之介, 俵 直弘, 南條 浩輝, 橋本 佳, 福田 隆, 増村 亮, 松田 繁樹, 李 晃伸, 渡部 晋治

    第117回音声言語情報処理研究会 (SIG-SLP) SLP-3   2017.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • User generated dialogue systems: uDialogue Reviewed

    Keiichi Tokuda, Akinobu Lee, Yoshihiko Nankaku, Keiichiro Oura, Kei Hashimoto, Daisuke Yamamoto, Ichi Takumi, Takahiro Uchiya, Shuhei Tsutsumi, Steve Renals, Junichi Yamagishi

    Human-Harmonized Information Technology   2   77 - 114   2017.04

     More details

    Language:English   Publishing type:Part of collection (book)   Publisher:Springer Japan  

    This chapter introduces the idea of user-generated dialogue content and describes our experimental exploration aimed at clarifying the mechanism and conditions that makes it workable in practice. One of the attractive points of a speech interface is to provide a vivid sense of interactivity that cannot be achieved with a text interface alone. This study proposes a framework that spoken dialogue systems are separated into content that can be produced and modified by users, and the systems that drive the content, and seek to clarify (1) the requirements of systems that enable the creation of attractive spoken dialogue, and (2) the conditions for the active generation of attractive dialogue content by users, while attempting to establish a method for realizing them. Experiments for validating user dialogue content generation were performed by installing interactive digital signage with a speech interface in public spaces as a dialogue device, and implementing a content generation environment for users via the Internet. The proposed framework is expected to lead to a breakthrough in the spread of using speech technology.

    DOI: 10.1007/978-4-431-56535-2_3

    Scopus

    researchmap

  • Investigation of relationship between speaking affordability and expressing common environments and knowledge in spoken dialogue system

    78 ( 78 )   125 - 128   2016.10

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    CiNii Articles

    CiNii Books

    researchmap

  • ANALYSIS OF RELATIONSHIP IN PSYCHOLOGICAL CHARACTERISTICS AT SHORT MEETINGS FOR AFFABLE SPOKEN DIALOGUE SYSTEMS

    78 ( 78 )   129 - 134   2016.10

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    CiNii Articles

    CiNii Books

    researchmap

  • ユーザフレンドリィな音声対話システム実現のためのユーザ話速および発話内容に基づくシステム話速制御手法の検討

    三原 寛哉, 李 晃伸

    研究報告音声言語情報処理(SLP)   2016-SLP-112 ( 15 )   1 - 6   2016.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:情報処理学会  

    researchmap

  • 音声対話システムのオープンコンテンツ化実現のためのモジュール仕様および管理手法

    山西 元樹,船谷内 泰斗,李 晃伸

    研究報告音声言語情報処理(SLP)   2016-SLP-112 ( 14 )   1 - 6   2016.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)   Publisher:情報処理学会  

    researchmap

  • 音声対話システムにおけるシステムからの話しかけと他者性認知の関連性の調査

    村上拓也, 李 晃伸, 西川 由里, 小島 良広, 遠藤 充

    HAIシンポジウム2015   238 - 243   2015.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 音声対話インタフェースにおけるマルチタスク性の適切な表出方法の検討

    小中 彩貴, 李 晃伸

    HAIシンポジウム2015   108 - 112   2015.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 音声対話システムにおける音環境への反応表出によるアフォーダンスの評価

    夏目 龍司, 李 晃伸

    HAIシンポジウム2015   94 - 98   2015.12

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 利用者による履歴付き対話の共同構築・拡張が可能なユーザ生成音声対話システム

    宮木 京介, 飯塚 遼, 李 晃伸

    日本音響学会2015年秋季研究発表会講演論文集   3-Q-22   2015.09

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 単語間非共有ノードに基づく単語信頼度を用いたキーワードの発話中遂次確定

    松尾 涼平, 小林 大晃, 李 晃伸

    日本音響学会2015年秋季研究発表会講演論文集   3-Q-12   2015.09

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Prosodically-Enhanced Recurrent Neural Network Language Models Reviewed International coauthorship International journal

    Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

    Proc. Conference of the International Speech Communiation Association (INTERSPEECH)   2390 - 2394   2015.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  • Prosodically-enhanced Recurrent Neural Network Language Models Reviewed

    Siva Reddy Gangireddy, Steve Renals, Yoshihiko Nankaku, Akinobu Lee

    16TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION (INTERSPEECH 2015), VOLS 1-5   2390 - 2394   2015

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC  

    Recurrent neural network language models have been shown to consistently reduce the word error rates (WERs) of large vocabulary speech recognition tasks. In this work we propose to enhance the RNNLMs with prosodic features computed using the context of the current word. Since it is plausible to compute the prosody features at the word and syllable level we have trained the models on prosody features computed at both these levels. To investigate the effectiveness of proposed models we report perplexity and WER for two speech recognition tasks, Switchboard and TED. We observed substantial improvements in perplexity and small improvements in WER.

    Web of Science

    researchmap

  • Voice interaction system with 3D-CG virtual agent for stand-alone smartphones Reviewed

    Daisuke Yamamoto, Keiichiro Oura, Ryota Nishimura, Takahiro Uchiya, Akinobu Lee, Keiichi Tokuda, Ichi Takumi

    HAI 2014 - Proceedings of the 2nd International Conference on Human-Agent Interaction   323 - 330   2014.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Association for Computing Machinery, Inc  

    In this paper, we propose a voice interaction system using 3D-CG virtual agents for stand-alone smartphones. Because the proposed system can handle speech recognition and speech synthesis on a stand-alone smartphone differently from the existing mobile voice interaction systems, this system enables us to talk naturally without encountering delays caused by network communications. Moreover, proposed system can be fully customized by dialogue scripts, Java-based plugins, and Android APIs. Therefore, developers can make original voice interaction systems for smartphones easily based on proposed system. We have made a subset of the proposed system available as opensource software. We expect that this system will contribute to studies of human-agent interaction using smartphones.

    DOI: 10.1145/2658861.2658874

    Scopus

    researchmap

  • Voice interaction system with 3D-CG virtual agent for stand-alone smartphones Reviewed International journal

    Daisuke Yamamoto, Keiichiro Oura, Ryota Nishimura, Takahiro Uchirya, Akinobu Lee, Ichi Takumi, Keiichi Tokuda

    the 2nd International Conference on Human Agent Interaction (HAI 2014), ACM digital library,   320 - 330   2014.10

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  • 統計的音声対話システムにおける音素系列を用いた頑健な応答選択

    佐伯 昌幸, 李 晃伸

    音声言語情報処理研究会SIG-SLP第101回研究会   2014.05

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • ユーザ生成型音声対話システムにおけるクリエイターとユーザの相互刺激によるインセンティブ向上の検討

    飯塚 遼, 李 晃伸

    音声言語情報処理研究会SIG-SLP第101回研究会   2014.05

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • 条件付き確率場に基づく仮説の逐次早期確定を用いた低遅延音声インタフェース

    伊神 陽介,李 晃伸,徳田 恵一,南角 吉彦

    音響学会公演論文集   63 - 64   2014.03

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • 大語彙連続音声認識における単語信頼度に基づく単語固有ノードの枝刈り手法の検討

    小林 大晃, 伊藤 直晃, 李 晃伸

    日本音響学会2014年春季研究発表会講演論文集   2014.03

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Response selection based on hypothesis generation that prioritizes neighborhood word of keywords in statistical spoken dialogue system

    3-Q5-13 - 224   2014.03

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    CiNii Articles

    researchmap

  • 条件付き確立場に基づく仮説の遂次早期確定を用い低遅延音声インタフェース

    伊神 陽介, 李 晃伸, 徳田 恵一, 南角 吉彦

    日本音響学会2014年春季研究発表会講演論文集   2-4-7   2014.03

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • ユーザ生成型音声対話コンテンツに向けた有限状態トランスデューサに基づく簡潔な対話記述法の検討

    船谷内 泰斗, 大浦 圭一郎, 南角 吉彦, 李 晃伸, 徳田 恵一

    音響学会講演論文集   223 - 224   2013.09

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • MMDAGENT - A FULLY OPEN-SOURCE TOOLKIT FOR VOICE INTERACTION SYSTEMS Reviewed International journal

    Akinobu Lee, Keiichiro Oura, Keiichi Tokuda

    2013 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP)   8382 - 8385   2013.05

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper describes development of an open-source toolkit which makes it possible to explore a vast variety of aspects in speech interactions at spoken dialog systems and speech interfaces. The toolkit tightly incorporates recent speech recognition and synthesis technologies with a 3-D CG rendering module that can manipulates expressive embodied agent characters. The software design and its interfaces are carefully designed to be fully open toolkit. Ongoing demonstration experiments to public indicates that it is promoting related researches and developments of voice interaction systems in various scenes.

    DOI: 10.1109/ICASSP.2013.6639300

    Web of Science

    researchmap

  • スマートフォン単体で動作する音声対話3Dエージェント「スマートメイちゃん」の開発

    山本 大介, 大浦 圭一郎, 李 晃伸 他

    情報処理学会インタラクション   675 - 680   2013.03

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • ユーザ参加型双方向音声案内デジタルサイネージシステムの開発・設置・運用事例 Invited

    徳田恵一, 大浦圭一郎, 李晃伸, 山本大介, 打矢隆弘, 内匠逸

    日本音響学会2013年春季研究発表会論文集   119 - 122   2013.03

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • On-Campus, User-Participatable, and Voice-Interactive Digital Signage(<Special Issue>Practical Issues of Spoken Dialogue Systems) Reviewed

    Keiichiro Oura, Daisuke Yamamoto, Ichi Takumi, Akinobu Lee, Keiichi Tokuda

    28 ( 1 )   60 - 67   2013.01

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    CiNii Articles

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1004/00008160/

  • Technical Advances of Speech-Oriented Guidance System "Takemaru-kun" by 10 Years of Long-Term Operation(<Special Issue>Practical Issues of Spoken Dialogue Systems) Reviewed

    NISIMURA Ryuichi, HARA Sunao, KAWANAMI Hiromichi, LEE Akinobu, SHIKANO Kiyohiro, Ryuichi Nishimura, Sunao Hara, Hiromichi Kawanami, Akinobu Lee, Kiyohiro Shikano

    Journal of the Japanese Society for Artificial Intelligence   28 ( 1 )   52 - 59   2013.01

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:The Japanese Society for Artificial Intelligence  

    DOI: 10.11517/jjsai.28.1_52

    CiNii Articles

    CiNii Books

    researchmap

    Other Link: http://id.nii.ac.jp/1004/00008159/

  • ドライバの社会性に関するCharacter自動推定

    神沼 充伸, 西崎 友規子, ブエ・ステファン, 南角 吉彦, 李 晃伸

    Human Interface 2012予稿集   2012.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 登録キーワードと汎用言語モデルを用いた音声認識部・応答選択部の密結合に基づく統計的音声対話システム

    平野隆司, 加藤杏樹, 南角吉彦, 李晃伸, 徳田恵一

    2012 Information Processing Society of Japan   2012-SLP-92 ( 3 )   1 - 6   2012.07

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • 双方向音声デジタルサイネージのための学内イベント登録システム

    山本大介, 大浦圭一郎, 李晃伸, 打矢隆弘, 内匠逸, 徳田恵一, 松尾啓志

    大学ITC推進協議会2011年度年次大会   2011.12

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 魅力ある音声インタラクションシステムを構築するためのオープンソースツールキットMMDAgent

    李晃伸, 大浦圭一郎, 徳田恵一

    Technical Report of IEICE   1 - 6   2011.12

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Speech recognition based on statistical models including multiple phonetic decision trees Reviewed

    Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Acoustical Science and Technology   32 ( 6 )   236 - 243   2011.11

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

  • 連続音声認識における仮説の低遅延逐次確定アルゴリズムの評価

    大野博之, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2011年秋季研究発表会論文集   45 - 46   2011.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Evaluation of Tree-Trellis Based Decoding on Over-Million LVCSR Reviewed

    Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proc. ISCA Interspeech2011   1937 - 1940   2011.08

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Bayesian Context Clustering Using Cross Validation for Speech Recognition Reviewed

    Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E94-D ( 3 )   668 - 678   2011.03

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    This paper proposes Bayesian context clustering using cross validation for hidden Markov model (HMM) based speech recognition. The Bayesian approach is a statistical technique for estimating reliable predictive distributions by treating model parameters as random variables. The variational Bayesian method, which is widely used as an efficient approximation of the Bayesian approach, has been applied to HMM-based speech recognition, and it shows good performance. Moreover, the Bayesian approach can select an appropriate model structure while taking account of the amount of training data. Since prior distributions which represent prior information about model parameters affect estimation of the posterior distributions and selection of model structure (e.g., decision tree based context clustering), the determination of prior distributions is an important problem. However, it has not been thoroughly investigated in speech recognition, and the determination technique of prior distributions has not performed well. The proposed method can determine reliable prior distributions without any tuning parameters and select an appropriate model structure while taking account of the amount of training data. Continuous phoneme recognition experiments show that the proposed method achieved a higher performance than the conventional methods.

    DOI: 10.1587/transinf.E94.D.668

    Web of Science

    researchmap

  • Evaluation of Tree-trellis based Decoding in Over-million LVCSR Reviewed

    Naoaki Ito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    12TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2011 (INTERSPEECH 2011), VOLS 1-5   1948 - 1951   2011

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC  

    Very large vocabulary continuous speech recognition (CSR) that can recognize every sentence is one of important goals in speech recognition. Several attempts have been made to achieve very large vocabulary CSR. However, very large vocabulary CSR using a tree-trellis based decoder has not been reported. We report the performance evaluation and improvement of the "Julius" tree-trellis based decoder in large vocabulary CSR (LVCSR) involving more than one million vocabulary, referred to here as over-million LVCSR. Experiments indicated that Julius achieved a word accuracy of about 91% and a real time factor of about 2 in over-million LVCSR for Japanese newspaper speech transcription.

    Web of Science

    researchmap

  • Speech recognition based on statistical models including multiple phonetic decision trees Reviewed

    Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Acoustical Science and Technology   32 ( 6 )   236 - 243   2011

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    We propose a speech recognition technique using multiple model structures. In the use of context-dependent models, decision-tree-based context clustering is applied to find an appropriate parameter tying structure. However, context clustering is usually performed on the basis of unreliable statistics of hidden Markov model (HMM) state sequences because the estimation of reliable state sequences requires an appropriate model structures, that cannot be obtained prior to context clustering. Therefore, context clustering and the estimation of state sequences essentially cannot be performed independently. To overcome this problem, we propose an optimization technique of state sequences based on an annealing process using multiple decision trees. In this technique, a new likelihood function is defined in order to treat multiple model structures, and the deterministic annealing expectation maximization algorithm is used as the training algorithm. Experimental continuous phoneme recognition results show that the proposed method of using only two decision trees achieved about an 11.1% relative error reduction over the conventional method. © 2011 The Acoustical Society of Japan.

    DOI: 10.1250/ast.32.236

    Scopus

    researchmap

  • Voice activity detection based on conditional random fields using multiple features(共著) Reviewed

    Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proc. Conference of the International Speech Communiation Association (INTERSPEECH)   2086 - 2089   2010.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  • Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition(共著) Reviewed

    Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proc. Conference of the International Speech Communiation Association (INTERSPEECH)   542 - 545   2010.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  • A Covariance-Tying Technique for HMM-Based Speech Synthesis Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   93 ( 3 )   595 - 601   2010.03

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

    DOI: 10.1587/transinf.E93.D.595

    Web of Science

    researchmap

  • 音声認識のデコーダと認識エンジン Reviewed

    李晃伸

    日本音響学会誌 日本音響学会   66 ( 1 )   28 - 31   2010.01

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    researchmap

  • Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition Reviewed

    Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2   542 - 545   2010

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    This paper proposes a speaker adaptation technique using a nonlinear spectral transform based on GMMs. One of the most popular forms of speaker adaptation is based on linear transforms, e.g., MLLR. Although MLLR uses multiple transforms according to regression classes, only a single linear transform is applied to each state. The proposed method performs nonlinear speaker adaptation based on a new likelihood function combining HMMs for recognition with GMMs for spectral transform. Moreover, the dependency of transforms on context can also be estimated in an integrated ML fashion. The proposed technique outperformed conventional approaches in phoneme-recognition experiments.

    Web of Science

    researchmap

  • Voice activity detection based on conditional random fields using multiple features Reviewed

    Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4   2086 - 2089   2010

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    This paper proposes a Voice Activity Detection (VAD) algorithm based on Conditional Random Fields (CRF) using multiple features. VAD is a technique used to distinguish between speech and non-speech in noisy environments and is an important component in many real-world speech applications. The posterior probability of output labels in the proposed method is directly modeled by the weighted sum of the feature functions. Effective features are automatically selected by estimating appropriate weight parameters to improve the accuracy of VAD. Experimental results on the CENSREC-1-C database revealed that the proposed approach can decrease error rates by using CRF.

    Web of Science

    researchmap

  • Computational Reduction of Contenous Speech Recognition Software "Julius" on SuperH Microprocessor Reviewed

    50 ( 11 )   2597 - 2606   2009.11

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    CiNii Articles

    CiNii Books

    researchmap

  • Development of a Toolkit for Spoken Dialog System with an Anthoropomorphic Agent: Galatea Reviewed

    Kouichi Katsurada, Akinobu Lee, Tatsuya Kawahara, Tatsuo Yotsukura, Shigeo Morishima, Takuya Nishimoto, Yoichi Yamashita, and Tsuneo Nitta

    Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)   148 - 153   2009.10

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • Recent Development of Open-Source Speech Recognition Engine Julius Reviewed

    Akinobu Lee and Tatsuya Kawahara

    Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)   131 - 137   2009.10

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • Tying Covariance Matrices to Reduce the Footprint of HMM-based Speech Synthesis Systems Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

    Proc. Conference of the International Speech Communiation Association (INTERSPEECH)   1759 - 1762   2009.09

     More details

    Language:English   Publishing type:Research paper (other academic)  

  • 総合報告 ユーザ負担のない話者・ 環境適応性を実現する自然な音声対話処理技術の総合開発

    鹿野清宏, 武田一哉, 河原達也, 河原英紀, 猿渡洋, 徳田恵一, 李 晃伸, 川波弘道, 西村竜一, Randy GOMEZ, 戸田智基, 西浦敬信, 高橋 徹, 坂野秀樹, 全 炳河

    電子情 報通信学会誌   92 ( 6 )   2009.06

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • Voice Conversion based on Simultaneous Modeling of Spectrum and F0 Reviewed

    Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

    Proc. IEEE International Conference on Acoustics, Speech and Signal Processing   3897 - 3900   2009.04

     More details

    Language:English   Publishing type:Research paper (other academic)  

  • Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5   1723 - 1726   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    This paper proposes a technique of reducing footprint of HMM-based speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.

    Web of Science

    researchmap

  • VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0 Reviewed

    Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS   3897 - 3900   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper proposes a simultaneous modeling of spectrum and F(0) for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F(0) sequences are usually converted by a simple linear function. This is because F(0) is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F(0) values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F(0) sequences.

    Web of Science

    researchmap

  • Speaker recognition based on Gaussian mixture models using variational Bayesian method

    Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    電子情報通信学会技術研究報告   108 ( 338 )   185 - 190   2008.12

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Speech recognition based on statistical models including multiple decision trees

    Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    電子情報通信学会技術研究報告   108 ( 338 )   221 - 226   2008.12

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 11 )   2693 - 2700   2008.11

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.

    DOI: 10.1093/ietisy/e91-d.11.2693

    Web of Science

    researchmap

  • Acoustic modeling based on model structure annealing for speech recognition Reviewed

    Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of Interspeech 2008   932 - 935   2008.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • 複数の音素決定木を用いた音声認識の検討

    塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2008年秋季研究発表会講演論文集   125 - 126   2008.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Speaker recognition based on variational Bayesian method Reviewed

    Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of Interspeech 2008   1417 - 1420   2008.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition Reviewed

    Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of Interspeech 2008   936 - 939   2008.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • クロスバリデーションを用いたベイズ基準によるコンテキストクラスタリング

    橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2008年春季研究発表会講演論文集   69 - 70   2008.03

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 変分ベイズ法に基づく話者認識

    伊藤達也, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2008年春季研究発表会講演論文集   143 - 144   2008.03

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System. Reviewed

    Tobias Cincarek, Hiromichi Kawanami, Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

    IEICE Transactions   91-D ( 3 )   576 - 587   2008

     More details

  • Probabilistic Answer Selection Based on Conditional Random Fields for Spoken Dialog System Reviewed

    Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5   215 - 218   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    A probabilistic answer selection for a spoken dialog system based on Conditional Random Fields (CRFs) is described. The probabilities of answers for a question is trained by CRFs based on the lexical and morphological properties of each word, the most likely answer against the recognized word sequence of question utterance will be chosen as the system output. Various set of feature functions were evaluated on the real data of a speech oriented information kiosk system, and it is shown that the morphological properties introduces positive effects on the response accuracy. Training with recognizer output of training database instead of manual transcription was also investigated. It was also shown that this proposed scheme can achieve higher accuracy than a conventional keyword-based answer selection.

    Web of Science

    researchmap

  • 変分ベイズ法に基づく音声認識のためのハイパーパラメータの共有構造

    橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2007年秋季研究発表会講演論文集   139 - 142   2007.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 音声認識のための音素決定木構造のアニーリングに基づく音響モデリング

    塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2007年秋季研究発表会講演論文集   143 - 146   2007.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 音素決定木構造のアニーリングに基づく音響モデリング

    塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    電子情報通信学会技術研究報告   107 ( 165 )   67 - 72   2007.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Speech Recognition Techniques for Real-World Robot Application

    LEE Akinobu, NISHIMURA Ryuichi

    Journal of The Society of Instrument and Control Engineers   46 ( 6 )   441 - 446   2007.06

     More details

    Language:Japanese   Publisher:The Society of Instrument and Control Engineers  

    DOI: 10.11499/sicejl1962.46.441

    CiNii Articles

    CiNii Books

    researchmap

    Other Link: https://jlc.jst.go.jp/DN/JALC/00295524175?from=CiNii

  • Insights gained from development and long-term operation of a real-environment speech-oriented guidance system Reviewed

    Tobias Cincarek, Ryuichi Nisimura, Akinobu Lee, Kiyohiro Shikano

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3   157 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper presents insights gained from operating a public speech-oriented guidance system. A real-environment speech database (300 hours) collected with the system over four years is described and analyzed regarding usage frequency, content and diversity. Having the first two years of the data completely transcribed, simulation of system development and evaluation of system performance over time is possible. The database is employed for acoustic and language modeling as well as construction of a question and answer database. Since the system input is not text but speech, the database enables also research on open-domain speech-based information access. Apart from that research on unsupervised acoustic modeling, language modeling and system portability can be carried out. A performance evaluation of the system in an early stage as well as late stage when using two years of real-environment data for constructing all system components shows the relative importance of developing each system component. The system's response accuracy is 83% for adults and 68% for children.

    Web of Science

    researchmap

  • Real-time continuous speech recognition system on SH-4A microprocessor Reviewed

    Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING   35 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of Julius (embedded Julius). Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. In this paper, we describe an implementation of the embedded Julius on a SH-4A microprocessor. SH-4A is a high-end 32-bit MPU (720MIPS) with on-chip FPU. However, further computational reduction is necessary for the embedded Julius to operate real-time. Applying some optimizations, the embedded Julius achieves real-time processing on the SH-4A. The experimental results show 0.89 x RT(real-time), resulting 4.0 times faster than baseline CSR. We also evaluated the embedded Julius on large vocabulary (20,000 words). It shows almost real-time processing (1.25 x RT).

    Web of Science

    researchmap

  • Hyperparameter estimation for speech recognition based on variational Bayesian approach

    Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of ASA & ASJ Joint Meeting   3042 - 3042   2006.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • 実環境における子供音声認識のための音韻モデルおよび教師なし話者適応の評価 Reviewed

    鮫島充, Randy Gomez, 李晃伸, 猿渡洋, 鹿野清宏

    情報処理学会論文誌   47 ( 7 )   2295 - 2304   2006.07

     More details

    Language:Japanese   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • An HMM-based Singing Voice Synthesis System Reviewed

    Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5   2274 - 2277   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    The present paper describes a corpus-based singing voice synthesis system based on hidden Markov models (HMMs). This system employs the HMM-based speech synthesis to synthesize singing voice. Musical information such as lyrics, tones, durations is modeled simultaneously in a unified framework of the context-dependent HMM. It can mimic the voice quality and singing style of the original singer. Results of a singing voice synthesis experiment show that the proposed system can synthesize smooth and natural-sounding singing voice.

    Web of Science

    researchmap

  • Voice Conversion Based on Mixtures of Factor Analyzers Reviewed

    Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5   2278 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC  

    This paper describes the voice conversion based on the Mixtures of Factor Analyzers (MFA) which can provide an efficient modeling with a limited amount of training data. As a typical spectral conversion method, a mapping algorithm based on the Gaussian Mixture Model (GMM) has been proposed. In this method two kinds of covariance matrix structures are often used : the diagonal and full covariance matrices. GMM with diagonal covariance matrices requires a large number of mixture components for accurately estimating spectral features. On the other hand, GMM with full covariance matrices needs sufficient training data to estimate model parameters. In order to cope with these problems, we apply MFA to voice conversion. MFA can be regarded as intermediate model between GMM with diagonal covariance and with full covariance. Experimental results show that MFA can improve the conversion accuracy compared with the conventional GMM.

    Web of Science

    researchmap

  • Reducing Computation on Parallel Decoding using Frame-wise Confidence Scores Reviewed

    Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5   1638 - 1641   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    Parallel decoding based on multiple models has been studied to cover various conditions and speakers at a time on a speech recognition system. However, running many recognizers in parallel applying all models causes the total computational cost to grow in proportion to the number of models. In this paper, an efficient way of finding and pruning unpromising decoding processes during search is proposed. By comparing temporal search statistics at each frame among all decoders, decoders with relatively unmatched model can be pruned in the middle of recognition process to save computational cost. This method allows the model structures to be mutually independent. Two frame-wise pruning measures based on maximum hypothesis likelihoods and top confidence scores respectively, and their combinations are investigated. Experimental results on parallel recognition of seven acoustic models showed that by using the both criteria, the total computational cost was reduced to 36.53% compared to full computation without degrading the recognition accuracy.

    Web of Science

    researchmap

  • Hidden semi-Markov model based speech recognition system using weighted finite-state transducer Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13   33 - 36   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In hidden Markov models (HNMs), state duration probabilities decrease exponentially with time. It would be inappropriate representation of temporal structure of speech. One of the solutions for this problem is integrating state duration probability distributions explicitly into the BNM. This form is known as a hidden semi-Markov model (HSMM) [1]. Although a number of attempts to use explicit duration models in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding.
    In the present paper, a fully consistent speech recognition system based on the HSMM framework is proposed. In a speaker-dependent continuous speech recognition experiment, HSNM-based speech recognition system achieved about 5.9% relative error reduction over the corresponding HMM-based one.

    Web of Science

    researchmap

  • Embedded Julius: Continuous speech recognition software for microprocessor Reviewed

    Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING   378 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of "Julius". Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. Julius works as a real time decoder on a PC. However further computational reduction is necessary to use Julius on a microprocessor. Further cost reduction is needed. For reducing cost of calculating pdfs (probability density function), Julius adopts a GMS (Gaussian Mixture Selection) method. In this paper, we modify the GMS method to realize a continuous speech recognizer on microprocessors. This approach does not change the structure of acoustic models in consistency with that used by conventional Julius, and enables developers to use acoustic models developed by popular modeling tools. On simulation, the proposed method has archived 20% reduction of computational costs compared to conventional GMS, 40% reduction compared to no GMS. Finally, the embedded version of Julius was tested on a developmental hardware platform named "T-engine". The proposed method showed 2.23 of RTF (Real Time Factor) resulting 79% of that of no GMS without any degradation of recognition performance.

    Web of Science

    researchmap

  • Embedded julius on T-Engine platform Reviewed

    Nobuo Hataoka, Hiroaki Kokubo, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2   37 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In this paper, we report implemental results of an embedded version of Julius. We used T-Engine (TM) as a hardware platform which has a SuperH microprocessor. The Julius is free and open Continuous Speech Recognition (CSR) software running on Personal Computers (PCs) which have huge CPU power and storage memory size. The technical problems to make Julius for embedded version are computing/process and memory reductions of Julius software. We realized 2.23 of RTF (Real Time Factor) of embedded speech recognition processing on the condition of 5000-word vocabulary without any recognition accuracy degradation.

    Web of Science

    researchmap

  • Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. Reviewed

    Shinichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yotsukura, Atsuhiko Kai, Akinobu Lee, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsushi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama

    Life-like characters - tools, affective functions, and applications.   187 - 212   2004

     More details

    Publisher:Springer  

    researchmap

  • Recent progress of open-source LVCSR engine Julius and Japanese model repository - Software of continuous speech recognition consortium

    Tatsuya Kawahara, Akinobu Lee, Kazuya Takeda, Katsunobu Itou, Kiyohiro Shikano

    8th International Conference on Spoken Language Processing, ICSLP 2004   3069 - 3072   2004

     More details

    Publishing type:Research paper (international conference proceedings)  

    Continuous Speech Recognition Consortium (CSRC) was founded for further enhancement of Japanese Dictation Toolkit that had been developed by the support of a Japanese agency. Overview of its product software is reported in this paper. The open-source LVCSR (large vocabulary continuous speech recognition) engine Julius has been improved both in performance and functionality, and it is also ported to Microsoft Windows in compliance with SAPI (Speech API). The software is now used for not a few languages and plenty of applications. For plug-and-play speech recognition in various applications, we have also compiled a repository of acoustic and language models for Japanese. Especially, the set of acoustic models realizes wider coverage of user generations and speech-input environments.

    Scopus

    researchmap

  • Real-time word confidence scoring using local posterior probabilities on tree trellis search

    Akinobu Lee, Kiyohiro Shikano, Tatsuya Kawahara

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   1   I793 - I796   2004

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    Confidence scoring based on word posterior probability is usually performed as a post process of speech recognition decoding, and also needs a large number of word hypotheses to get enough confidence quality. We propose a simple way of computing the word confidence using estimated posterior probability while decoding. At the word expansion of stack decoding search, the local sentence likelihoods that contains heuristic scores of unreached segment are directly used to compute the posterior probabilities. Experimental result showed that, although the likelihoods are not optimal, it can provide slightly better confidence measures compared with N-best lists, while the computation is faster than 100-best method because no N-best decoding is required.

    Scopus

    researchmap

  • Development of Anthropomorphic Spoken Dialogue Agent Toolkit

    Sagayama,Shigeki, Itou,Katsunobu, Utsuro,Takehito, Kai,Atsuhiko, Kobayashi,Takao, Shimodaira,Hiroshi, Den,Yasuharu, Tokuda,Keiichi, Nakamura,Satoshi, N9ishimoto,Takuya, Nitta,Tsuneo, Hirose,Keikichi, Minematsu,Nobuaki, Morishima,Shigeo, Yamashita,Yoichi, Yamada,Atsushi, Lee,Akinobu

    IPSJ SIG Notes   2003 ( 124 )   319 - 324   2003.12

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:一般社団法人電子情報通信学会  

    researchmap

  • Galatea : An Anthropomorphic Spoken Dialogue Agent Toolkit

    Sagayama,Shigeki, Kawamoto,Shin-ichi, Shimodaira,Hiroshi, Nitta,Tsuneo, Nishimoto,Takuya, Nakamura,Satoshi, Itou,Katsunobu, Morishima,Shigeo, Yotsukura,Tatsuo, Kai,Atsuhiko, Lee,Akinobu, Yamashita,Yoichi, Kobayashi,Takao, Tokuda,Keiichi, Hirose,Keikichi, Minematsu,Nobuaki, Yamada,Atsushi, Den,Yasuharu, Utsuro,Takehito

    IPSJ SIG Notes   2003 ( 14 )   57 - 64   2003.02

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:Information Processing Society of Japan (IPSJ)  

    researchmap

  • Complemental Back-off Algorithm for Merging Language Models

    NAGATOMO KENTARO, NISIMURA RYUICHI, KOMATSU KUMIKO, KURODA YUKA, LEE AKINOBU, SARUWATARI HIROSHI, SHIKANO KIYOHIRO

    IPSJ Journal   43 ( 9 )   2884 - 2893   2002.09

     More details

    Language:Japanese   Publisher:Information Processing Society of Japan (IPSJ)  

    A new complemental back-off algorithm for merging two N-gram language models is proposed. By merging several topic-dependent or style-dependent models, we can construct a general model that covers wider range of topics easily. However, a conventional method that simply concatenates the training corpora or interpolating each probabilities often levels off the task-dependent characteristics in each language models, and weaken the linguistic constraint in total. We propose a new back-off scheme that assigns the unseen N-gram probabilities according to the probabilities of the another model. It can assign more reliable probabilities to the unseen N-grams, and no original corpora is needed for the merging. We implemented a command tool that realizes this method, and evaluated it on three recognition tasks (medical consulting, food recipe query and newspaper article). The results reveal that our merged model can keep the same accuracy of each original one.

    CiNii Articles

    researchmap

  • Design of Software Toolkit for Anthropomorphic Spoken Dialog Agent Software with Customization-Oriented Features Reviewed

    Shin-ichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yokura, Atuhiko Kai, Akinobu Li, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsuhi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama

    Transactions of Information Processing Society of Japan   43 ( 7 )   2249-2264   2002.05

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:Information Processing Society of Japan  

    researchmap

  • Project for Development of Anthropomorphic Spoken-Dialog Agent

    SAGAYAMA,Shigeki, ITOU,Katsunobu, UTSURO,Takehito, KAI,Atsuhiko, KOBAYASHI,Takao, SHIMODAIRA,Hiroshi, DEN,Yasuharu, TOKUDA,Keiichi, NAKAMURA,Satoshi, NISHIMOTO,Takuya, NITTA,Tsuneo, HIROSE,Keikichi, MORISHIMA,Shigeo, MINEMATSU,Nobuaki, YAMASHITA,Yoichi, YAMADA,Atsushi, LEE,Akinobu

    日本音響学会研究発表会講演論文集   2002 ( 1 )   27 - 28   2002.03

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • A Design of Anthropomorphic Spoken Dialog Agent Toolkit

    Kawamoto,Shin-ichi, Shimodaira,Hiroshi, Nitta,Tsuneo, Nishimoto,Takuya, Nakamura,Satoshi, Itou,Katsunobu, Morishima,Shigeo, Yotsukura,Tatsuo, Kai,Atsuhiko, Lee,Akinobu, Yamashita,Yoichi, Kobayashi,Takao, Tokuda,Keiichi, Hirose,Keikichi, Minematsu,Nobuaki, Yamada,Atsushi, Den,Yasuharu, Utsuro,Takehito, Sagayama,Shigeki

    情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告   2002 ( 10 )   61 - 66   2002.02

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:一般社団法人情報処理学会  

    This paper discusses a design and architecture of a software toolkit to develop an anthropomorphic spoken dialog agent (ASDA) that is easy to customize. Such human-like voice dialogue agent is one of the promising man-machine interface for next generations. To develop such a software toolkit, this paper firstly discusses the basic requirements that ASDA system should have, and then designs the software modules of the systems to fulfill the requirements. A prototype agent system has been developed on the UNIX-base systems by using the software toolkit that is under development. Discussions of the current achievement of the toolkit that will become publicly available as a free software are given finally.

    researchmap

  • Japanese Dictation Toolkit --- 1999 version --- Reviewed

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    The Journal of the Acoustical Society of Japan   57 ( 3 )   210-214 - 214   2001.03

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:日本音響学会  

    DOI: 10.20697/jasj.57.3_210

    researchmap

  • Julius-An open source real-Time large vocabulary recognition engine

    Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology   1691 - 1694   2001

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:International Speech Communication Association  

    Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost realtime decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tiedmixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.

    Scopus

    researchmap

  • Gaussian mixture selection using context-independent HMM Reviewed

    Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   1   69 - 72   2001

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    We address a method to efficiently select Gaussian mixtures for fast acoustic likelihood computation. It makes use of context-independent models for selection and back-off of corresponding triphone models. Specifically, for the k-best phone models by the preliminary evaluation, triphone models of higher resolution are applied, and others are assigned likelihoods with the monophone models. This selection scheme assigns more reliable back-off likelihoods to the un-selected states than the conventional Gaussian selection based on a VQ codebook. It can also incorporate efficient Gaussian pruning at the preliminary evaluation, which offsets the increased size of the pre-selection model. Experimental results show that the proposed method achieves comparable performance as the standard Gaussian selection, and performs much better under aggressive pruning condition. Together with the phonetic tied-mixture (PTM) modeling, acoustic matching cost is reduced to almost 14% with little loss of accuracy.

    DOI: 10.1109/ICASSP.2001.940769

    Scopus

    researchmap

  • Large Vocabulary Continuous Speech Recognition using Multi-Pass Search Algorithm Reviewed

    Akinobu Lee

    2000.09

     More details

    Language:English   Publishing type:Doctoral thesis  

  • Japanese Dictation Toolkit --- 1998 version --- Reviewed

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    The Journal of the Acoustical Society of Japan   56 ( 4 )   255-259 - 259   2000.04

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)   Publisher:日本音響学会  

    researchmap

  • Free software toolkit for Japanese large vocabulary continuous speech recognition. Reviewed

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000   476 - 479   2000

     More details

  • A new phonetic tied-mixture model for efficient decoding Reviewed

    Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Kiyohiro Shikano

    ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings   3   1269 - 1272   2000

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:Institute of Electrical and Electronics Engineers Inc.  

    A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% with a 20000-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.

    DOI: 10.1109/ICASSP.2000.861808

    Scopus

    researchmap

  • Japanese Dictation Toolkit --- 1997 version --- Reviewed

    Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano

    The Journal of the Acoustical Society of Japan   55 ( 3 )   175-180 - 180   1999.03

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:日本音響学会  

    DOI: 10.20697/jasj.55.3_175

    researchmap

To the head of this page.▲