Papers - LEE Akinobu
-
Real-time word confidence scoring using local posterior probabilities on tree trellis search
Akinobu Lee, Kiyohiro Shikano, Tatsuya Kawahara
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 I793 - I796 2004
Language:English Publishing type:Research paper (international conference proceedings)
Confidence scoring based on word posterior probability is usually performed as a post process of speech recognition decoding, and also needs a large number of word hypotheses to get enough confidence quality. We propose a simple way of computing the word confidence using estimated posterior probability while decoding. At the word expansion of stack decoding search, the local sentence likelihoods that contains heuristic scores of unreached segment are directly used to compute the posterior probabilities. Experimental result showed that, although the likelihoods are not optimal, it can provide slightly better confidence measures compared with N-best lists, while the computation is faster than 100-best method because no N-best decoding is required.
-
Development of Anthropomorphic Spoken Dialogue Agent Toolkit
Sagayama,Shigeki, Itou,Katsunobu, Utsuro,Takehito, Kai,Atsuhiko, Kobayashi,Takao, Shimodaira,Hiroshi, Den,Yasuharu, Tokuda,Keiichi, Nakamura,Satoshi, N9ishimoto,Takuya, Nitta,Tsuneo, Hirose,Keikichi, Minematsu,Nobuaki, Morishima,Shigeo, Yamashita,Yoichi, Yamada,Atsushi, Lee,Akinobu
IPSJ SIG Notes 2003 ( 124 ) 319 - 324 2003.12
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:一般社団法人電子情報通信学会
-
Galatea : An Anthropomorphic Spoken Dialogue Agent Toolkit
Sagayama,Shigeki, Kawamoto,Shin-ichi, Shimodaira,Hiroshi, Nitta,Tsuneo, Nishimoto,Takuya, Nakamura,Satoshi, Itou,Katsunobu, Morishima,Shigeo, Yotsukura,Tatsuo, Kai,Atsuhiko, Lee,Akinobu, Yamashita,Yoichi, Kobayashi,Takao, Tokuda,Keiichi, Hirose,Keikichi, Minematsu,Nobuaki, Yamada,Atsushi, Den,Yasuharu, Utsuro,Takehito
IPSJ SIG Notes 2003 ( 14 ) 57 - 64 2003.02
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:Information Processing Society of Japan (IPSJ)
-
Complemental Back-off Algorithm for Merging Language Models
NAGATOMO KENTARO, NISIMURA RYUICHI, KOMATSU KUMIKO, KURODA YUKA, LEE AKINOBU, SARUWATARI HIROSHI, SHIKANO KIYOHIRO
IPSJ Journal 43 ( 9 ) 2884 - 2893 2002.09
Language:Japanese Publisher:Information Processing Society of Japan (IPSJ)
A new complemental back-off algorithm for merging two N-gram language models is proposed. By merging several topic-dependent or style-dependent models, we can construct a general model that covers wider range of topics easily. However, a conventional method that simply concatenates the training corpora or interpolating each probabilities often levels off the task-dependent characteristics in each language models, and weaken the linguistic constraint in total. We propose a new back-off scheme that assigns the unseen N-gram probabilities according to the probabilities of the another model. It can assign more reliable probabilities to the unseen N-grams, and no original corpora is needed for the merging. We implemented a command tool that realizes this method, and evaluated it on three recognition tasks (medical consulting, food recipe query and newspaper article). The results reveal that our merged model can keep the same accuracy of each original one.
-
Design of Software Toolkit for Anthropomorphic Spoken Dialog Agent Software with Customization-Oriented Features Reviewed
Shin-ichi Kawamoto, Hiroshi Shimodaira, Tsuneo Nitta, Takuya Nishimoto, Satoshi Nakamura, Katsunobu Itou, Shigeo Morishima, Tatsuo Yokura, Atuhiko Kai, Akinobu Li, Yoichi Yamashita, Takao Kobayashi, Keiichi Tokuda, Keikichi Hirose, Nobuaki Minematsu, Atsuhi Yamada, Yasuharu Den, Takehito Utsuro, Shigeki Sagayama
Transactions of Information Processing Society of Japan 43 ( 7 ) 2249-2264 2002.05
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:Information Processing Society of Japan
-
Project for Development of Anthropomorphic Spoken-Dialog Agent
SAGAYAMA,Shigeki, ITOU,Katsunobu, UTSURO,Takehito, KAI,Atsuhiko, KOBAYASHI,Takao, SHIMODAIRA,Hiroshi, DEN,Yasuharu, TOKUDA,Keiichi, NAKAMURA,Satoshi, NISHIMOTO,Takuya, NITTA,Tsuneo, HIROSE,Keikichi, MORISHIMA,Shigeo, MINEMATSU,Nobuaki, YAMASHITA,Yoichi, YAMADA,Atsushi, LEE,Akinobu
日本音響学会研究発表会講演論文集 2002 ( 1 ) 27 - 28 2002.03
Language:Japanese Publishing type:Research paper (scientific journal)
-
A Design of Anthropomorphic Spoken Dialog Agent Toolkit
Kawamoto,Shin-ichi, Shimodaira,Hiroshi, Nitta,Tsuneo, Nishimoto,Takuya, Nakamura,Satoshi, Itou,Katsunobu, Morishima,Shigeo, Yotsukura,Tatsuo, Kai,Atsuhiko, Lee,Akinobu, Yamashita,Yoichi, Kobayashi,Takao, Tokuda,Keiichi, Hirose,Keikichi, Minematsu,Nobuaki, Yamada,Atsushi, Den,Yasuharu, Utsuro,Takehito, Sagayama,Shigeki
情報処理学会研究報告. HI, ヒューマンインタフェース研究会報告 2002 ( 10 ) 61 - 66 2002.02
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:一般社団法人情報処理学会
This paper discusses a design and architecture of a software toolkit to develop an anthropomorphic spoken dialog agent (ASDA) that is easy to customize. Such human-like voice dialogue agent is one of the promising man-machine interface for next generations. To develop such a software toolkit, this paper firstly discusses the basic requirements that ASDA system should have, and then designs the software modules of the systems to fulfill the requirements. A prototype agent system has been developed on the UNIX-base systems by using the software toolkit that is under development. Discussions of the current achievement of the toolkit that will become publicly available as a free software are given finally.
-
Japanese Dictation Toolkit --- 1999 version --- Reviewed
Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano
The Journal of the Acoustical Society of Japan 57 ( 3 ) 210-214 - 214 2001.03
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:日本音響学会
-
Julius-An open source real-Time large vocabulary recognition engine
Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano
EUROSPEECH 2001 - SCANDINAVIA - 7th European Conference on Speech Communication and Technology 1691 - 1694 2001
Language:English Publishing type:Research paper (international conference proceedings) Publisher:International Speech Communication Association
Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost realtime decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tiedmixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.
-
Gaussian mixture selection using context-independent HMM Reviewed
Akinobu Lee, Tatsuya Kawahara, Kiyohiro Shikano
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 1 69 - 72 2001
Language:English Publishing type:Research paper (scientific journal)
We address a method to efficiently select Gaussian mixtures for fast acoustic likelihood computation. It makes use of context-independent models for selection and back-off of corresponding triphone models. Specifically, for the k-best phone models by the preliminary evaluation, triphone models of higher resolution are applied, and others are assigned likelihoods with the monophone models. This selection scheme assigns more reliable back-off likelihoods to the un-selected states than the conventional Gaussian selection based on a VQ codebook. It can also incorporate efficient Gaussian pruning at the preliminary evaluation, which offsets the increased size of the pre-selection model. Experimental results show that the proposed method achieves comparable performance as the standard Gaussian selection, and performs much better under aggressive pruning condition. Together with the phonetic tied-mixture (PTM) modeling, acoustic matching cost is reduced to almost 14% with little loss of accuracy.
-
Large Vocabulary Continuous Speech Recognition using Multi-Pass Search Algorithm Reviewed
Akinobu Lee
2000.09
Language:English Publishing type:Doctoral thesis
-
Japanese Dictation Toolkit --- 1998 version --- Reviewed
Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano
The Journal of the Acoustical Society of Japan 56 ( 4 ) 255-259 - 259 2000.04
Language:Japanese Publishing type:Research paper (scientific journal) Publisher:日本音響学会
-
Free software toolkit for Japanese large vocabulary continuous speech recognition. Reviewed
Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Shigeki Sagayama, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano
Sixth International Conference on Spoken Language Processing, ICSLP 2000 / INTERSPEECH 2000, Beijing, China, October 16-20, 2000 476 - 479 2000
-
A new phonetic tied-mixture model for efficient decoding Reviewed
Akinobu Lee, Tatsuya Kawahara, Kazuya Takeda, Kiyohiro Shikano
ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings 3 1269 - 1272 2000
Language:English Publishing type:Research paper (international conference proceedings) Publisher:Institute of Electrical and Electronics Engineers Inc.
A phonetic tied-mixture (PTM) model for efficient large vocabulary continuous speech recognition is presented. It is synthesized from context-independent phone models with 64 mixture components per state by assigning different mixture weights according to the shared states of triphones. Mixtures are then re-estimated for optimization. The model achieves a word error rate of 7.0% with a 20000-word dictation of newspaper corpus, which is comparable to the best figure by the triphone of much higher resolutions. Compared with conventional PTMs that share Gaussians by all states, the proposed model is easily trained and reliably estimated. Furthermore, the model enables the decoder to perform efficient Gaussian pruning. It is found out that computing only two out of 64 components does not cause any loss of accuracy. Several methods for the pruning are proposed and compared, and the best one reduced the computation to about 20%.
-
Japanese Dictation Toolkit --- 1997 version --- Reviewed
Tatsuya Kawahara, Akinobu Lee, Tetsunori Kobayashi, Kazuya Takeda, Nobuaki Minematsu, Katsunobu Itou, Akinori Ito, Mikio Yamamoto, Atsushi Yamada, Takehito Utsuro, Kiyohiro Shikano
The Journal of the Acoustical Society of Japan 55 ( 3 ) 175-180 - 180 1999.03
Language:English Publishing type:Research paper (scientific journal) Publisher:日本音響学会