SAKO Shinji

写真a

Affiliation Department etc.

Department of Computer Science
Department of Computer Science

Title

Associate Professor

Mail Address

E-mail address

Graduating School

  • 1995.04
    -
    1999.03

    Nagoya Institute of Technology   Faculty of Engineering   Graduated

Graduate School

  • 2001.04
    -
    2004.03

    Nagoya Institute of Technology  Graduate School, Division of Engineering  Department of Electrical & Computer EngineeringDoctor's Course  Completed

External Career

  • 2016.07
    -
    2017.03

    Technical University of Munich   Institute for Human-Machine Communication   Researcher  

  • 2014.07
    -
    2014.08

    AGH University of Science and Technology   Faculty of Computer Science, Electronics and Telecommunications   Guest Scientists  

  • 2012.06
    -
    2012.12

    Technical University Munich   Institute for Human-Machine Communication   Guest Scientists  

  • 2004.04
    -
    2007.03

    The University of Tokyo Graduate School of Information Science and Technology   Research Assistant  

  • 2003.04
    -
    2003.06

    Advanced Telecommunications Research Institute International  

Academic Society Affiliations

  • 2010.06
    -
    Now

    Japanese Association of Sign Linguistics

  • 2010.06
    -
    Now

    Human Interface Society

  • 2007.10
    -
    Now

    The Institute of Image Information and Television Enginerrs

  • 2005.10
    -
    Now

    The Japanese Society for Artificial Intelligence

  • 2001.03
    -
    Now

    Acoustical Society of Japan

display all >>

Field of expertise (Grants-in-aid for Scientific Research classification)

  • Kansei informatics

  • Rehabilitation science/Welfare engineering

  • Perceptual information processing

 

Thesis for a degree

  • Audio-Visual Speech/Singing-voice Synthesis and Gesture Recognition for Multimodal Human Computer Interaction

    Shinji Sako 

      2004.03

    8   1

Papers

  • Learning Siamese Features for Finger Spelling Recognition

    Bogdan Kwolek, Shinji Sako

    Advanced Concepts for Intelligent Vision Systems     2017.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    This paper is devoted to finger spelling recognition on the basis of images acquired by a single color camera. The recognition is realized on the basis of learned low-dimensional embeddings. The embeddings are calculated both by single as well as multiple siamese-based convolutional neural networks. We train classifiers operating on such features as well as convolutional neural networks operating on raw images. The evaluations are performed on freely available dataset with finger spellings of Japanese Sign Language. The best results are achieved by a classifier trained on concatenated features of multiple siamese networks.

  • Recognition of JSL finger spelling using convolutional neural networks

    Hosoe Hana, Shinji Sako, Bogdan Kwolek

    15th IAPR International Conference on Machine Vision Applications (MVA) ( IEEE )    85 - 88   2017.07  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Recently, a few methods for recognition of hand postures on depth maps using convolutional neural networks were proposed. In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language. The recognition takes place on the basis of single gray image. The finger spelled signs are recognized using a convolutional neural network. A dataset consisting of 5000 samples has been recorded. A 3D articulated hand model has been designed to generate synthetic finger spellings and to extend the real hand gestures. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. The full dataset and Caffe model are available for download.

  • Japanese Sign Language Recognition Based on Three Elements of Sign Using Kinect v2 Sensor

    Shohei Awata, Shinji Sako, Tadashi Kitamura

    International Conference on Human-Computer Interaction 2017   713   95 - 102   2017.07

    Research paper (international conference proceedings)   Multiple Authorship

    The visual feature of Japanese sign language is divided into two of manual signals and non-manual signals. Manual signals are represented by the shape and motion of the hands, and convey mainly the meaning of sign language words. In terms of phonology, sign language words consist of three elements: hand’s motion, position, and shape. We have developed a recognition system for Japanese sign language (JSL) with abstraction of manual signals based on these three elements. The abstraction of manual signals is performed based on Japanese sign language words dictionary. Features like coordinates of hands and depth images are extracted from manual signals using the depth sensor, Kinect v2. This system recognizes three elements independently and the final result is obtained under the comprehensive judgment from the results of three elements recognition. In this paper, we used two methods for recognition of hand shape, a contour-based method suggested by Keogh and template matching of depth image. The recognition methods of other elements were hidden Markov model for recognition of motion and the normal distribution learned by maximum likelihood estimation for recognition of position, as a same manner of our previous research. Based on our proposal method, we prepared recognition methods of each element and conducted an experiment of 400 sign language words recognition based on a sign language words dictionary.

  • Real-Time Japanese Sign Language Recognition Based on Three Phonological Elements of Sign

    Shinji Sako, Mika Hatano, Tadashi Kitamura

    18th International Conference HCI International 2016, Communications in Computer and Information Science   618   130 - 136   2016.06  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Sign language is the visual language of deaf people. It is also natural language, different in form from spoken language. To resolve a communication barrier between hearing people and deaf, several researches for automatic sign language recognition (ASLR) system are now under way. However, existing research of ASLR deals with only small vocabulary. It is also limited in the environmental conditions and the use of equipment. In addition, compared with the research field of speech recognition, there is no large scale sign database for various reasons. One of the major reasons is that there is no official writing system for Japanese sign Language (JSL). In such a situation, we focused on the use of the knowledge of phonology of JSL and dictionary, in order to develop a develop a real-time JSL sign recognition system. The dictionary consists of over 2,000 JSL sign, each sign defined as three types of phonological elements in JSL: hand shape, motion, and position. Thanks to the use of the dictionary, JSL sign models are represented by the combination of these elements. It also can respond to the expansion of a new sign. Our system employs Kinect v2 sensor to obtain sign features such as hand shape, position, and motion. Depth sensor enables real-time processing and robustness against environmental changes. In general, recognition of hand shape is not easy in the field of ASLR due to the complexity of hand shape. In our research, we apply a contour-based method to hand shape recognition. To recognize hand motion and position, we adopted statistical models such as Hidden Markov models (HMMs) and Gaussian mixture models (GMMs). To address the problem of lack of database, our method utilizes the pseudo motion and hand shape data. We conduct experiments to recognize 223 JSL sign targeted professional sign language interpreters.

  • Automatic Performance Rendering Method for Keyboard Instruments based on Statistical Model that Associates Performance Expression and Musical Notation

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Journal of Japan Society for Fuzzy Theory and Intelligent Informatics ( Japan Society for Fuzzy Theory and Intelligent Informatics )  28 ( 2 ) 557 - 569   2016.04  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • Comparative Analysis of Performance Expression using Similarity Metrics based on Statistical Model and Musical Score Information

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Transactions of Japan Society of Kansei Engineering   15 ( 1 ) 255 - 263   2016.02  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

  • Contour-based Hand Pose Recognition for Sign Language Recognition

    Mika Hatano, Shinji Sako, Tadashi Kitamura

    6th Workshop on Speech and Language Processing for Assistive Technologies     2015.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    We are developing a real-time Japanese sign language recognition system that employs abstract hand motions based on three elements familiar to sign language: hand motion, position, and pose. This study considers the method of hand pose recognition using depth images obtained from the Kinect v2 sensor. We apply the contour-based method proposed by Keogh to hand pose recognition. This method recognizes a contour by means of discriminators generated from contours. We conducted experiments on recognizing 23 hand poses from 400 Japanese sign language words.

  • Violin Fingering Estimation According to the Performer's Skill Level Based on Conditional Random Field

    Shinji Sako, Wakana Nagata, Tadashi Kitamura

    Human-Computer Interaction, Part II, HCII 2015, LNCS 9170     485 - 494   2015.08  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    In this paper, we propose a method that estimates appropriate violin fingering according to the performer’s skill level based on a conditional random field (CRF). A violin is an instrument that can produce the same pitch for different fingering patterns, and these patterns depend on skill level. We previously proposed a statistical method for violin fingering estimation, but that method required a certain amount of training data in the form of fingering annotation corresponding to each note in the music score. This was a major issue of our previous method, because it takes time and effort to produce the annotations. To solve this problem, we proposed a method to automatically generate training data for a fingering model using existing violin textbooks. Our experimental results confirmed the effectiveness of the proposed method.

  • Violin Fingering Estimation According to Skill Level based on Hidden Markov Model

    Wakana Nagata, Shinji Sako, Tadashi Kitamura

    Proceedings ICMC|SMC|2014     1233 - 1238   2014.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    This paper describes a method that estimates the appropriate violin fingering pattern according to the player’s skill level. A violin can produce the same pitch for different fingering patterns, which generally vary depending on skill level. Our proposed method translates musical scores into suitable fingering patterns for the desired skill level by modeling a violin player’s left hand based on a hidden Markov model. In this model, fingering is regarded as the hidden state and the output is the musical note in the score. We consider that differences in fingering patterns depend on skill level, which determines the prioritization between ease of playing and performance expression, and this priority is related to the output probability. Transition probabilty is defined by the appropriateness and ease of the transitions between states in the musical composition. Manually setting optimal model parameters for these probabilities is difficult because they are too numerous. Therefore, we decide on the parameters by training with textbook fingering. Experimental results show that fingering can be estimated for a skill level using the proposed method. The results of evaluations conducted of the method’s fingering patterns for beginners indicate that they are as good as or better than textbook fingering patterns.

  • Laminae: A stochastic modeling-based autonomous performance rendering system that elucidates performer characteristics

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Proceedings ICMC|SMC|2014     1271 - 1276   2014.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    This paper proposes a system for performance rendering of keyboard instruments. The goal is fully autonomous rendition of a performance with musical smoothness without losing any of the characteristics of the actual performer. The system is based on a method that systematizes combinations of constraints and thereby elucidates the renderingprocess of the performer’s performance by defining stochastic models that associate artistic deviations observed in a performance with the contextual information notated in its musical score. The proposed system can be used to search for a sequence of optimum cases from the combination of all existing cases of the existing performance observed to render an unseen performance efficiently. Evaluations conducted indicate that musical features expected in existing performances are transcribed appropriately in the performances rendered by the system. The evaluations also demonstrate that the system is able to render performances with natural expressions stably, even for compositions with unconventional styles. Consequently, performances rendered via the proposed system have won first prize in the autonomous section of a performance rendering contest for computer systems.

display all >>

Review Papers

  • HMM-based Automatic Sign Language Recognition using Phonemic Structure of Japanese Sign Language

    Shinji Sako, Tadashi Kitamura

    Journa of The Japan Society for Welfare Engineering ( Japan Society for Welfare Engineering )  17 ( 2 ) 2 - 7   2015.11

    Introduction and explanation (international conference proceedings)   Multiple Authorship

  • Speech/Sound based Human Interfaces (1) Construction of Speech Synthesis Systems using HTS

    Keiichiro Oura, Heiga Zen, Shinji Sako, Keiichi Tokuda

    Human interface ( Human interface Society )  12 ( 1 ) 35 - 40   2010.02  [Refereed]

    Introduction and explanation (international conference proceedings)   Multiple Authorship

Presentations

  • Non-word speech recognition by Julius and Chainer

    Toshiharu Tadano, Masahiko Nawate, Hiroshi Ito, Shinji Sako, Kazuo Kadowaki

    17th Forum on Information Technology (FIT2017)  (University of Tokyo, Hongo Campus)  2017.09  -  2017.09  IPSJ, IEICE Information and System Society, IEICE Human Communication Group

  • A study on JSL Finger Spelling Recognition Using Convolutional Neural Networks

    Shinji Sako, Hana Hosoe, Bogdan Kwolek

    IEICE 90th Technical Committee on Well-being Information Technology (WIT)  (RION Co., LTD.)  2017.05  -  2017.05  Institute of Electronics, Information and Communication Engineers

  • Audio-to-score alignment considering score information based on segmental conditional random fields

    Ayako Noguchi, Shinji Sako, Tadashi Kitamura

    2017 Spring Meeting Acoustical Society of Japan  (Meiji University, Ikuta Campus)  2017.03  -  2017.03  Acoustical Society of Japan

  • SVMによる非語の正誤判定を用いた音韻検査の自動化の検討

    多々納 俊治, 縄手 雅彦, 伊藤 史人, 酒向 慎司

    電子情報通信学会 HCGシンポジウム2016  (高知市文化プラザかるぽーと)  2016.12  -  2016.12  電子情報通信学会

  • Vowel duration dependent hidden Markov model for automatic lyrics recognition

    Shohei Awata, Shinji Sako, Tadashi Kitamura

    th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan  (Honolulu, Hawaii)  2016.11  -  2016.12  Acouustical SOciety of America, Acoustical Society of Japan

  • Segmental Conditional Random FieldsAudio-to-Score Alignment Distinguishing Percussion Sounds From Other Instruments

    Ayako Noguchi, Shinji Sako, Tadashi Kitamura

    th Joint Meeting of the Acoustical Society of America and Acoustical Society of Japan  (Honolulu, Hawaii)  2016.11  -  2016.12  Acouustical SOciety of America, Acoustical Society of Japan

  • Fundamental Study on Automatic Recognition of Non-Manual Signals in JSL using HMM

    Rina Kato, Shinji Sako, Tadashi Kitamura

    IEICE 85th Technical Committee on Well-being Information Technology (WIT)  (University of Yamanashi)  2016.07  -  2016.07  Institute of Electronics, Information and Communication Engineers

  • Recognition of Non-manual signal of sign language using HMM

    Rina Kato, Shinji Sako, Tadashi Kitamura

    2016 IEICE General Conference, ISS Student Poster Session  (Kyushu University, Ito Campus)  2016.03  -  2016.03  Institute of Electronics, Information and Communication Engineers

  • Route choice behavior model of the pedestrian based on the spatial orientation

    Kayo Osako, Shinji Sako, Tadashi Kitamura

    2016 IEICE General Conference, ISS Student Poster Session  (Kyushu University, Ito Campus)  2016.03  -  2016.03  Institute of Electronics, Information and Communication Engineers

  • Detection of Rakugo role alternation using the multi-modal information

    Hana Hosoe, Shinji Sako, Tadashi Kitamura

    2016 IEICE General Conference, ISS Student Poster Session  (Kyushu University, Ito Campus)  2016.03  -  2016.03  Institute of Electronics, Information and Communication Engineers

display all >>

Work

  • Ryry: Automatic Accompaniment System Capable of Polyphonic Instruments

    Software  2013.03  -  2013.03

Academic Awards Received

  • Japan Society for Fuzzy Theory and Intelligent Informatics Best Paper Award

    2017.09.14    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • 78th National Convention of IPSJ, Student Encouragement Award

    2016.03.11    

    Winner: Naoto Sato, Shinji Sako, Tadashi Kitamura

  • IPSJ Yamashita SIG Research Award

    2016.03    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Ai Zukawa, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Kana Miyata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2013.09    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

  • Forum on Information Technology Encouragement Award 2013

    2013.09    

    Winner: Nagata Wakana, Shinji Sako, Tadashi Kitamura

  • IPSJ Tokai Buranchi, Student Paper Encouragement Award

    2013.05.19    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper presents a method for describing the characteristics of human musical performance. We consider the problem of building models that express the ways in which deviations from a strict interpretations of the score occurs in the performance, and that cluster these deviations automatically. The clustering process is performed using expressive representations unambiguously notated on the musical score, without any arbitrariness by the human observer. The result of clustering is obtained as hierarchical tree structures for each deviational factor that occurred during the operation of the instrument. This structure represents an approximation of the performer's interpretation with information notated on the score they used during the performance. Through validations of applying the method to the data measured from real performances, we show that the use of information regarding expressive representation on the musical score enables the efficient estimation of generative-model for the musical performance. In addition, this method is also useful for objective proof of the existing knowledge about the musical performance by information to support such a knowledge having been shown from our model.

  • Tokai-Section Joint Conference on Electrical and Related Engineering, Encouragement Award

    2013.01.22    

    Winner: Wakana Nagata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2012.12.12    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

display all >>

 
 

Academic Activity

  • 2015.06
    -
    Now

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2016.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2016.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2015.03

    The Institute of Electronics, Information and Communication Engineers  

  • 2013.05
    -
    2015.06

    The Institute of Electronics, Information and Communication Engineers  

  • 2013.04
    -
    2014.09

    Acoustical Society of Japan  

  • 2011.05
    -
    2013.04

    The Institute of Electronics, Information and Communication Engineers  

  • 2010.05
    -
    2011.03

    The Institute of Electronics, Information and Communication Engineers  

  • 2010.04
    -
    Now

    Acoustical Society of Japan  

  • 2009.04
    -
    2013.03

    Acoustical Society of Japan