Papers - LEE Akinobu

Division display  101 - 120 of about 135 /  All the affair displays >>
  • クロスバリデーションを用いたベイズ基準によるコンテキストクラスタリング

    橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2008年春季研究発表会講演論文集   69 - 70   2008.03

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 変分ベイズ法に基づく話者認識

    伊藤達也, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2008年春季研究発表会講演論文集   143 - 144   2008.03

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • Development, Long-Term Operation and Portability of a Real-Environment Speech-Oriented Guidance System. Reviewed

    Tobias Cincarek, Hiromichi Kawanami, Ryuichi Nisimura, Akinobu Lee, Hiroshi Saruwatari, Kiyohiro Shikano

    IEICE Transactions   91-D ( 3 )   576 - 587   2008

     More details

  • Probabilistic Answer Selection Based on Conditional Random Fields for Spoken Dialog System Reviewed

    Yoshitaka Yoshimi, Ryota Kakitsuba, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2008: 9TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION 2008, VOLS 1-5   215 - 218   2008

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    A probabilistic answer selection for a spoken dialog system based on Conditional Random Fields (CRFs) is described. The probabilities of answers for a question is trained by CRFs based on the lexical and morphological properties of each word, the most likely answer against the recognized word sequence of question utterance will be chosen as the system output. Various set of feature functions were evaluated on the real data of a speech oriented information kiosk system, and it is shown that the morphological properties introduces positive effects on the response accuracy. Training with recognizer output of training database instead of manual transcription was also investigated. It was also shown that this proposed scheme can achieve higher accuracy than a conventional keyword-based answer selection.

    Web of Science

    researchmap

  • 変分ベイズ法に基づく音声認識のためのハイパーパラメータの共有構造

    橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2007年秋季研究発表会講演論文集   139 - 142   2007.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 音声認識のための音素決定木構造のアニーリングに基づく音響モデリング

    塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    日本音響学会2007年秋季研究発表会講演論文集   143 - 146   2007.09

     More details

    Language:Japanese   Publishing type:Research paper (other academic)  

    researchmap

  • 音素決定木構造のアニーリングに基づく音響モデリング

    塩田さやか, 橋本佳, 全炳河, 南角吉彦, 李晃伸, 徳田恵一

    電子情報通信学会技術研究報告   107 ( 165 )   67 - 72   2007.07

     More details

    Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    researchmap

  • Speech Recognition Techniques for Real-World Robot Application

    LEE Akinobu, NISHIMURA Ryuichi

    Journal of The Society of Instrument and Control Engineers   46 ( 6 )   441 - 446   2007.06

     More details

    Language:Japanese   Publisher:The Society of Instrument and Control Engineers  

    DOI: 10.11499/sicejl1962.46.441

    CiNii Articles

    CiNii Books

    researchmap

    Other Link: https://jlc.jst.go.jp/DN/JALC/00295524175?from=CiNii

  • Insights gained from development and long-term operation of a real-environment speech-oriented guidance system Reviewed

    Tobias Cincarek, Ryuichi Nisimura, Akinobu Lee, Kiyohiro Shikano

    2007 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL IV, PTS 1-3   157 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    This paper presents insights gained from operating a public speech-oriented guidance system. A real-environment speech database (300 hours) collected with the system over four years is described and analyzed regarding usage frequency, content and diversity. Having the first two years of the data completely transcribed, simulation of system development and evaluation of system performance over time is possible. The database is employed for acoustic and language modeling as well as construction of a question and answer database. Since the system input is not text but speech, the database enables also research on open-domain speech-based information access. Apart from that research on unsupervised acoustic modeling, language modeling and system portability can be carried out. A performance evaluation of the system in an early stage as well as late stage when using two years of real-environment data for constructing all system components shows the relative importance of developing each system component. The system's response accuracy is 83% for adults and 68% for children.

    Web of Science

    researchmap

  • Real-time continuous speech recognition system on SH-4A microprocessor Reviewed

    Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    2007 IEEE NINTH WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING   35 - +   2007

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of Julius (embedded Julius). Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. In this paper, we describe an implementation of the embedded Julius on a SH-4A microprocessor. SH-4A is a high-end 32-bit MPU (720MIPS) with on-chip FPU. However, further computational reduction is necessary for the embedded Julius to operate real-time. Applying some optimizations, the embedded Julius achieves real-time processing on the SH-4A. The experimental results show 0.89 x RT(real-time), resulting 4.0 times faster than baseline CSR. We also evaluated the embedded Julius on large vocabulary (20,000 words). It shows almost real-time processing (1.25 x RT).

    Web of Science

    researchmap

  • Hyperparameter estimation for speech recognition based on variational Bayesian approach

    Kei Hashimoto, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    Proceedings of ASA & ASJ Joint Meeting   3042 - 3042   2006.11

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • 実環境における子供音声認識のための音韻モデルおよび教師なし話者適応の評価 Reviewed

    鮫島充, Randy Gomez, 李晃伸, 猿渡洋, 鹿野清宏

    情報処理学会論文誌   47 ( 7 )   2295 - 2304   2006.07

     More details

    Language:Japanese   Publishing type:Research paper (international conference proceedings)  

    researchmap

  • An HMM-based Singing Voice Synthesis System Reviewed

    Keijiro Saino, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5   2274 - 2277   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    The present paper describes a corpus-based singing voice synthesis system based on hidden Markov models (HMMs). This system employs the HMM-based speech synthesis to synthesize singing voice. Musical information such as lyrics, tones, durations is modeled simultaneously in a unified framework of the context-dependent HMM. It can mimic the voice quality and singing style of the original singer. Results of a singing voice synthesis experiment show that the proposed system can synthesize smooth and natural-sounding singing voice.

    Web of Science

    researchmap

  • Voice Conversion Based on Mixtures of Factor Analyzers Reviewed

    Yosuke Uto, Yoshihiko Nankaku, Tomoki Toda, Akinobu Lee, Keiichi Tokuda

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5   2278 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INT SPEECH COMMUNICATION ASSOC  

    This paper describes the voice conversion based on the Mixtures of Factor Analyzers (MFA) which can provide an efficient modeling with a limited amount of training data. As a typical spectral conversion method, a mapping algorithm based on the Gaussian Mixture Model (GMM) has been proposed. In this method two kinds of covariance matrix structures are often used : the diagonal and full covariance matrices. GMM with diagonal covariance matrices requires a large number of mixture components for accurately estimating spectral features. On the other hand, GMM with full covariance matrices needs sufficient training data to estimate model parameters. In order to cope with these problems, we apply MFA to voice conversion. MFA can be regarded as intermediate model between GMM with diagonal covariance and with full covariance. Experimental results show that MFA can improve the conversion accuracy compared with the conventional GMM.

    Web of Science

    researchmap

  • Reducing Computation on Parallel Decoding using Frame-wise Confidence Scores Reviewed

    Tomohiro Hakamata, Akinobu Lee, Yoshihiko Nankaku, Keiichi Tokuda

    INTERSPEECH 2006 AND 9TH INTERNATIONAL CONFERENCE ON SPOKEN LANGUAGE PROCESSING, VOLS 1-5   1638 - 1641   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:ISCA-INST SPEECH COMMUNICATION ASSOC  

    Parallel decoding based on multiple models has been studied to cover various conditions and speakers at a time on a speech recognition system. However, running many recognizers in parallel applying all models causes the total computational cost to grow in proportion to the number of models. In this paper, an efficient way of finding and pruning unpromising decoding processes during search is proposed. By comparing temporal search statistics at each frame among all decoders, decoders with relatively unmatched model can be pruned in the middle of recognition process to save computational cost. This method allows the model structures to be mutually independent. Two frame-wise pruning measures based on maximum hypothesis likelihoods and top confidence scores respectively, and their combinations are investigated. Experimental results on parallel recognition of seven acoustic models showed that by using the both criteria, the total computational cost was reduced to 36.53% compared to full computation without degrading the recognition accuracy.

    Web of Science

    researchmap

  • Hidden semi-Markov model based speech recognition system using weighted finite-state transducer Reviewed

    Keiichiro Oura, Heiga Zen, Yoshihiko Nankaku, Akinobu Lee, Keiichi Tokuda

    2006 IEEE International Conference on Acoustics, Speech and Signal Processing, Vols 1-13   33 - 36   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In hidden Markov models (HNMs), state duration probabilities decrease exponentially with time. It would be inappropriate representation of temporal structure of speech. One of the solutions for this problem is integrating state duration probability distributions explicitly into the BNM. This form is known as a hidden semi-Markov model (HSMM) [1]. Although a number of attempts to use explicit duration models in speech recognition systems have been proposed, they are not consistent because various approximations were used in both training and decoding.
    In the present paper, a fully consistent speech recognition system based on the HSMM framework is proposed. In a speaker-dependent continuous speech recognition experiment, HSNM-based speech recognition system achieved about 5.9% relative error reduction over the corresponding HMM-based one.

    Web of Science

    researchmap

  • Embedded Julius: Continuous speech recognition software for microprocessor Reviewed

    Hiroaki Kokubo, Nobuo Hataoka, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    2006 IEEE WORKSHOP ON MULTIMEDIA SIGNAL PROCESSING   378 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of "Julius". Julius is open source CSR software, and has been used by many researchers and developers in Japan as a standard decoder on PCs. Julius works as a real time decoder on a PC. However further computational reduction is necessary to use Julius on a microprocessor. Further cost reduction is needed. For reducing cost of calculating pdfs (probability density function), Julius adopts a GMS (Gaussian Mixture Selection) method. In this paper, we modify the GMS method to realize a continuous speech recognizer on microprocessors. This approach does not change the structure of acoustic models in consistency with that used by conventional Julius, and enables developers to use acoustic models developed by popular modeling tools. On simulation, the proposed method has archived 20% reduction of computational costs compared to conventional GMS, 40% reduction compared to no GMS. Finally, the embedded version of Julius was tested on a developmental hardware platform named "T-engine". The proposed method showed 2.23 of RTF (Real Time Factor) resulting 79% of that of no GMS without any degradation of recognition performance.

    Web of Science

    researchmap

  • Embedded julius on T-Engine platform Reviewed

    Nobuo Hataoka, Hiroaki Kokubo, Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano

    2006 INTERNATIONAL SYMPOSIUM ON INTELLIGENT SIGNAL PROCESSING AND COMMUNICATIONS, VOLS 1 AND 2   37 - +   2006

     More details

    Language:English   Publishing type:Research paper (international conference proceedings)   Publisher:IEEE  

    In this paper, we report implemental results of an embedded version of Julius. We used T-Engine (TM) as a hardware platform which has a SuperH microprocessor. The Julius is free and open Continuous Speech Recognition (CSR) software running on Personal Computers (PCs) which have huge CPU power and storage memory size. The technical problems to make Julius for embedded version are computing/process and memory reductions of Julius software. We realized 2.23 of RTF (Real Time Factor) of embedded speech recognition processing on the condition of 5000-word vocabulary without any recognition accuracy degradation.

    Web of Science

    researchmap

  • Galatea: Open-Source Software for Developing Anthropomorphic Spoken Dialog Agents. Reviewed

    Shinichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yotsukura, Atsuhiko Kai, Akinobu Lee, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsushi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama

    Life-like characters - tools, affective functions, and applications.   187 - 212   2004

     More details

    Publisher:Springer  

    researchmap

  • Recent progress of open-source LVCSR engine Julius and Japanese model repository - Software of continuous speech recognition consortium

    Tatsuya Kawahara, Akinobu Lee, Kazuya Takeda, Katsunobu Itou, Kiyohiro Shikano

    8th International Conference on Spoken Language Processing, ICSLP 2004   3069 - 3072   2004

     More details

    Publishing type:Research paper (international conference proceedings)  

    Continuous Speech Recognition Consortium (CSRC) was founded for further enhancement of Japanese Dictation Toolkit that had been developed by the support of a Japanese agency. Overview of its product software is reported in this paper. The open-source LVCSR (large vocabulary continuous speech recognition) engine Julius has been improved both in performance and functionality, and it is also ported to Microsoft Windows in compliance with SAPI (Speech API). The software is now used for not a few languages and plenty of applications. For plug-and-play speech recognition in various applications, we have also compiled a repository of acoustic and language models for Japanese. Especially, the set of acoustic models realizes wider coverage of user generations and speech-input environments.

    Scopus

    researchmap

To the head of this page.▲