Papers - LEE Akinobu

Division display  81 - 100 of about 135 /  All the affair displays >>
  • Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition(共著) Reviewed

    Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proc. Conference of the International Speech Communiation Association (INTERSPEECH)   542 - 545   2010.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

  • A Covariance-Tying Technique for HMM-Based Speech Synthesis Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   93 ( 3 )   595 - 601   2010.03

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    A technique for reducing the footprints of HMM-based speech synthesis systems by tying all covariance matrices of state distributions is described. HMM-based speech synthesis systems usually leave smaller footprints than unit-selection synthesis systems because they store statistics rather than speech waveforms. However, further reduction is essential to put them on embedded devices, which have limited memory. In accordance with the empirical knowledge that covariance matrices have a smaller impact on the quality of synthesized speech than mean vectors, we propose a technique for clustering mean vectors while tying all covariance matrices. Subjective listening test results showed that the proposed technique can shrink the footprints of an HMM-based speech synthesis system while retaining the quality of the synthesized speech.

    DOI: 10.1587/transinf.E93.D.595

    Web of Science

    researchmap

  • 音声認識のデコーダと認識エンジン Reviewed

    李晃伸

    日本音響学会誌 日本音響学会   66 ( 1 )   28 - 31   2010.01

     More details

    Language:English   Publishing type:Research paper (scientific journal)  

    researchmap

  • Speaker Adaptation Based on Nonlinear Spectral Transform for Speech Recognition Reviewed

    Toyohiro Hayashi, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 1-2   542 - 545   2010

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    This paper proposes a speaker adaptation technique using a nonlinear spectral transform based on GMMs. One of the most popular forms of speaker adaptation is based on linear transforms, e.g., MLLR. Although MLLR uses multiple transforms according to regression classes, only a single linear transform is applied to each state. The proposed method performs nonlinear speaker adaptation based on a new likelihood function combining HMMs for recognition with GMMs for spectral transform. Moreover, the dependency of transforms on context can also be estimated in an integrated ML fashion. The proposed technique outperformed conventional approaches in phoneme-recognition experiments.

    Web of Science

    researchmap

  • Voice activity detection based on conditional random fields using multiple features Reviewed

    Akira Saito, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    11TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2010 (INTERSPEECH 2010), VOLS 3 AND 4   2086 - 2089   2010

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    This paper proposes a Voice Activity Detection (VAD) algorithm based on Conditional Random Fields (CRF) using multiple features. VAD is a technique used to distinguish between speech and non-speech in noisy environments and is an important component in many real-world speech applications. The posterior probability of output labels in the proposed method is directly modeled by the weighted sum of the feature functions. Effective features are automatically selected by estimating appropriate weight parameters to improve the accuracy of VAD. Experimental results on the CENSREC-1-C database revealed that the proposed approach can decrease error rates by using CRF.

    Web of Science

    researchmap

  • Computational Reduction of Contenous Speech Recognition Software "Julius" on SuperH Microprocessor Reviewed

    50 ( 11 )   2597 - 2606   2009.11

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    CiNii Articles

    CiNii Books

    researchmap

  • Development of a Toolkit for Spoken Dialog System with an Anthoropomorphic Agent: Galatea Reviewed

    Kouichi Katsurada, Akinobu Lee, Tatsuya Kawahara, Tatsuo Yotsukura, Shigeo Morishima, Takuya Nishimoto, Yoichi Yamashita, and Tsuneo Nitta

    Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)   148 - 153   2009.10

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • Recent Development of Open-Source Speech Recognition Engine Julius Reviewed

    Akinobu Lee and Tatsuya Kawahara

    Proc. Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)   131 - 137   2009.10

     More details

    Language:English   Publishing type:Research paper (other academic)  

    researchmap

  • Tying Covariance Matrices to Reduce the Footprint of HMM-based Speech Synthesis Systems Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

    Proc. Conference of the International Speech Communiation Association (INTERSPEECH)   1759 - 1762   2009.09

     More details

    Language:English   Publishing type:Research paper (other academic)  

  • 総合報告 ユーザ負担のない話者・ 環境適応性を実現する自然な音声対話処理技術の総合開発

    鹿野清宏, 武田一哉, 河原達也, 河原英紀, 猿渡洋, 徳田恵一, 李 晃伸, 川波弘道, 西村竜一, Randy GOMEZ, 戸田智基, 西浦敬信, 高橋 徹, 坂野秀樹, 全 炳河

    電子情 報通信学会誌   92 ( 6 )   2009.06

     More details

    Language:Japanese   Publishing type:Research paper (scientific journal)  

    researchmap

  • Voice Conversion based on Simultaneous Modeling of Spectrum and F0 Reviewed

    Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, and Keiichi Tokuda

    Proc. IEEE International Conference on Acoustics, Speech and Signal Processing   3897 - 3900   2009.04

     More details

    Language:English   Publishing type:Research paper (other academic)  

  • Tying covariance matrices to reduce the footprint of HMM-based speech synthesis systems Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2009: 10TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2009, VOLS 1-5   1723 - 1726   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    This paper proposes a technique of reducing footprint of HMM-based speech synthesis systems by tying all covariance matrices. HMM-based speech synthesis systems usually consume smaller footprint than unit-selection synthesis systems because statistics rather than speech waveforms are stored. However, further reduction is essential to put them on embedded devices which have very small memory. According to the empirical knowledge that covariance matrices have smaller impact for the quality of synthesized speech than mean vectors, here we propose a clustering technique of mean vectors while tying all covariance matrices. Subjective listening test results show that the proposed technique can shrink the footprint of an HMM-based speech synthesis system while retaining the quality of synthesized speech.

    Web of Science

    researchmap

  • VOICE CONVERSION BASED ON SIMULTANEOUS MODELING OF SPECTRUM AND F0 Reviewed

    Kaori Yutani, Yosuke Uto, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    2009 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOLS 1- 8, PROCEEDINGS   3897 - 3900   2009

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper proposes a simultaneous modeling of spectrum and F(0) for voice conversion based on MSD (Multi-Space Probability Distribution) models. As a conventional technique, a spectral conversion based on GMM (Gaussian Mixture Model) has been proposed. Although this technique converts spectral feature sequences nonlinearly based on GMM, F(0) sequences are usually converted by a simple linear function. This is because F(0) is undefined in unvoiced segments. To overcome this problem, we apply MSD models. The MSD-GMM allows to model continuous F(0) values in voiced frames and a discrete symbol representing unvoiced frames within an unified framework. Furthermore, the MSD-HMM is adopted to model long term correlations in F(0) sequences.

    Web of Science

    researchmap

  • Speaker recognition based on Gaussian mixture models using variational Bayesian method

    Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    電子情報通信学会技術研究報告   108 ( 338 )   185 - 190   2008.12

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Speech recognition based on statistical models including multiple decision trees

    Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    電子情報通信学会技術研究報告   108 ( 338 )   221 - 226   2008.12

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • A Fully Consistent Hidden Semi-Markov Model-Based Speech Recognition System Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS   E91D ( 11 )   2693 - 2700   2008.11

     More details

    Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE-INST ELECTRONICS INFORMATION COMMUNICATIONS ENG  

    In a hidden Markov model (HMM), state duration probabilities decrease exponentially with time, which fails to adequately represent the temporal structure of speech. One of the solutions to this problem is integrating state duration probability distributions explicitly into the HMM. This form is known as a hidden semi-Markov model (HSMM). However, though a number of attempts to use HSMMs in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding. By avoiding these approximations using a generalized forward-back ward algorithm, a context-dependent duration modeling technique and weighted finite-state transducers (WFSTs), we construct a fully consistent HSMM-based speech recognition system. In a speaker-dependent continuous speech recognition experiment, our system achieved about 9.1 % relative error reduction over the corresponding HMM-based system.

    DOI: 10.1093/ietisy/e91-d.11.2693

    Web of Science

    researchmap

  • Acoustic modeling based on model structure annealing for speech recognition Reviewed

    Sayaka Shiota, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of Interspeech 2008   932 - 935   2008.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • 複数の音素決定木を用いた音声認識の検討

    塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2008年秋季研究発表会講演論文集   125 - 126   2008.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Speaker recognition based on variational Bayesian method Reviewed

    Tatsuya Ito, Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of Interspeech 2008   1417 - 1420   2008.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • Bayesian context clustering using cross valid prior distribution for HMM-based speech recognition Reviewed

    Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of Interspeech 2008   936 - 939   2008.09

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

To the head of this page.▲