SAKO Shinji

写真a

Affiliation Department etc.

Department of Computer Science
Department of Computer Science
Center for Research on Assistive Technology for Building a New Community

Title

Associate Professor

Mail Address

E-mail address

Graduating School

  • 1995.04
    -
    1999.03

    Nagoya Institute of Technology   Faculty of Engineering   Graduated

Graduate School

  • 2001.04
    -
    2004.03

    Nagoya Institute of Technology  Graduate School, Division of Engineering  Department of Electrical & Computer EngineeringDoctor's Course  Completed

External Career

  • 2016.07
    -
    2017.03

    Technical University of Munich   Institute for Human-Machine Communication   Researcher  

  • 2014.07
    -
    2014.08

    AGH University of Science and Technology   Faculty of Computer Science, Electronics and Telecommunications   Guest Scientists  

  • 2012.06
    -
    2012.12

    Technical University Munich   Institute for Human-Machine Communication   Guest Scientists  

  • 2004.04
    -
    2007.03

    The University of Tokyo Graduate School of Information Science and Technology   Research Assistant  

  • 2003.04
    -
    2003.06

    Advanced Telecommunications Research Institute International  

Academic Society Affiliations

  • 2010.06
    -
    Now

    Japanese Association of Sign Linguistics

  • 2010.06
    -
    Now

    Human Interface Society

  • 2007.10
    -
    Now

    The Institute of Image Information and Television Enginerrs

  • 2005.10
    -
    Now

    The Japanese Society for Artificial Intelligence

  • 2001.03
    -
    Now

    Acoustical Society of Japan

display all >>

Field of expertise (Grants-in-aid for Scientific Research classification)

  • Rehabilitation science/Welfare engineering

  • Kansei informatics

  • Perceptual information processing

 

Thesis for a degree

  • Audio-Visual Speech/Singing-voice Synthesis and Gesture Recognition for Multimodal Human Computer Interaction

    Shinji Sako 

      2004.03

    8   1

Papers

  • Constructing a Japanese Sign Language Multi-Dimensional Database

    •Yuji Nagashima, Daisuke Hara, Shinji Sako, Keiko Watanabe, Yasuo Horiuchi, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

    The 7th Meeting of Signed and SpokenLanguage Linguistics (SSLL 2018)     2018.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

  • Learning Siamese Features for Finger Spelling Recognition

    Bogdan Kwolek, Shinji Sako

    Advanced Concepts for Intelligent Vision Systems. ACIVS 2017. Lecture Notes in Computer Science, vol 10617 ( Springer )  107617   225 - 236   2017.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    This paper is devoted to finger spelling recognition on the basis of images acquired by a single color camera. The recognition is realized on the basis of learned low-dimensional embeddings. The embeddings are calculated both by single as well as multiple siamese-based convolutional neural networks. We train classifiers operating on such features as well as convolutional neural networks operating on raw images. The evaluations are performed on freely available dataset with finger spellings of Japanese Sign Language. The best results are achieved by a classifier trained on concatenated features of multiple siamese networks.

  • Recognition of JSL finger spelling using convolutional neural networks

    Hosoe Hana, Shinji Sako, Bogdan Kwolek

    15th IAPR International Conference on Machine Vision Applications (MVA) ( IEEE )    85 - 88   2017.07  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Recently, a few methods for recognition of hand postures on depth maps using convolutional neural networks were proposed. In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language. The recognition takes place on the basis of single gray image. The finger spelled signs are recognized using a convolutional neural network. A dataset consisting of 5000 samples has been recorded. A 3D articulated hand model has been designed to generate synthetic finger spellings and to extend the real hand gestures. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. The full dataset and Caffe model are available for download.

  • Japanese Sign Language Recognition Based on Three Elements of Sign Using Kinect v2 Sensor

    Shohei Awata, Shinji Sako, Tadashi Kitamura

    International Conference on Human-Computer Interaction 2017   713   95 - 102   2017.07

    Research paper (international conference proceedings)   Multiple Authorship

    The visual feature of Japanese sign language is divided into two of manual signals and non-manual signals. Manual signals are represented by the shape and motion of the hands, and convey mainly the meaning of sign language words. In terms of phonology, sign language words consist of three elements: hand’s motion, position, and shape. We have developed a recognition system for Japanese sign language (JSL) with abstraction of manual signals based on these three elements. The abstraction of manual signals is performed based on Japanese sign language words dictionary. Features like coordinates of hands and depth images are extracted from manual signals using the depth sensor, Kinect v2. This system recognizes three elements independently and the final result is obtained under the comprehensive judgment from the results of three elements recognition. In this paper, we used two methods for recognition of hand shape, a contour-based method suggested by Keogh and template matching of depth image. The recognition methods of other elements were hidden Markov model for recognition of motion and the normal distribution learned by maximum likelihood estimation for recognition of position, as a same manner of our previous research. Based on our proposal method, we prepared recognition methods of each element and conducted an experiment of 400 sign language words recognition based on a sign language words dictionary.

  • Real-Time Japanese Sign Language Recognition Based on Three Phonological Elements of Sign

    Shinji Sako, Mika Hatano, Tadashi Kitamura

    18th International Conference HCI International 2016, Communications in Computer and Information Science   618   130 - 136   2016.06  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Sign language is the visual language of deaf people. It is also natural language, different in form from spoken language. To resolve a communication barrier between hearing people and deaf, several researches for automatic sign language recognition (ASLR) system are now under way. However, existing research of ASLR deals with only small vocabulary. It is also limited in the environmental conditions and the use of equipment. In addition, compared with the research field of speech recognition, there is no large scale sign database for various reasons. One of the major reasons is that there is no official writing system for Japanese sign Language (JSL). In such a situation, we focused on the use of the knowledge of phonology of JSL and dictionary, in order to develop a develop a real-time JSL sign recognition system. The dictionary consists of over 2,000 JSL sign, each sign defined as three types of phonological elements in JSL: hand shape, motion, and position. Thanks to the use of the dictionary, JSL sign models are represented by the combination of these elements. It also can respond to the expansion of a new sign. Our system employs Kinect v2 sensor to obtain sign features such as hand shape, position, and motion. Depth sensor enables real-time processing and robustness against environmental changes. In general, recognition of hand shape is not easy in the field of ASLR due to the complexity of hand shape. In our research, we apply a contour-based method to hand shape recognition. To recognize hand motion and position, we adopted statistical models such as Hidden Markov models (HMMs) and Gaussian mixture models (GMMs). To address the problem of lack of database, our method utilizes the pseudo motion and hand shape data. We conduct experiments to recognize 223 JSL sign targeted professional sign language interpreters.

  • Automatic Performance Rendering Method for Keyboard Instruments based on Statistical Model that Associates Performance Expression and Musical Notation

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Journal of Japan Society for Fuzzy Theory and Intelligent Informatics ( Japan Society for Fuzzy Theory and Intelligent Informatics )  28 ( 2 ) 557 - 569   2016.04  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • Comparative Analysis of Performance Expression using Similarity Metrics based on Statistical Model and Musical Score Information

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Transactions of Japan Society of Kansei Engineering   15 ( 1 ) 255 - 263   2016.02  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

  • Contour-based Hand Pose Recognition for Sign Language Recognition

    Mika Hatano, Shinji Sako, Tadashi Kitamura

    6th Workshop on Speech and Language Processing for Assistive Technologies     17 - 21   2015.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    We are developing a real-time Japanese sign language recognition system that employs abstract hand motions based on three elements familiar to sign language: hand motion, position, and pose. This study considers the method of hand pose recognition using depth images obtained from the Kinect v2 sensor. We apply the contour-based method proposed by Keogh to hand pose recognition. This method recognizes a contour by means of discriminators generated from contours. We conducted experiments on recognizing 23 hand poses from 400 Japanese sign language words.

  • Violin Fingering Estimation According to the Performer's Skill Level Based on Conditional Random Field

    Shinji Sako, Wakana Nagata, Tadashi Kitamura

    Human-Computer Interaction, Part II, HCII 2015, LNCS 9170     485 - 494   2015.08  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    In this paper, we propose a method that estimates appropriate violin fingering according to the performer’s skill level based on a conditional random field (CRF). A violin is an instrument that can produce the same pitch for different fingering patterns, and these patterns depend on skill level. We previously proposed a statistical method for violin fingering estimation, but that method required a certain amount of training data in the form of fingering annotation corresponding to each note in the music score. This was a major issue of our previous method, because it takes time and effort to produce the annotations. To solve this problem, we proposed a method to automatically generate training data for a fingering model using existing violin textbooks. Our experimental results confirmed the effectiveness of the proposed method.

  • Violin Fingering Estimation According to Skill Level based on Hidden Markov Model

    Wakana Nagata, Shinji Sako, Tadashi Kitamura

    Proceedings ICMC|SMC|2014     1233 - 1238   2014.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    This paper describes a method that estimates the appropriate violin fingering pattern according to the player’s skill level. A violin can produce the same pitch for different fingering patterns, which generally vary depending on skill level. Our proposed method translates musical scores into suitable fingering patterns for the desired skill level by modeling a violin player’s left hand based on a hidden Markov model. In this model, fingering is regarded as the hidden state and the output is the musical note in the score. We consider that differences in fingering patterns depend on skill level, which determines the prioritization between ease of playing and performance expression, and this priority is related to the output probability. Transition probabilty is defined by the appropriateness and ease of the transitions between states in the musical composition. Manually setting optimal model parameters for these probabilities is difficult because they are too numerous. Therefore, we decide on the parameters by training with textbook fingering. Experimental results show that fingering can be estimated for a skill level using the proposed method. The results of evaluations conducted of the method’s fingering patterns for beginners indicate that they are as good as or better than textbook fingering patterns.

display all >>

Review Papers

  • HMM-based Automatic Sign Language Recognition using Phonemic Structure of Japanese Sign Language

    Shinji Sako, Tadashi Kitamura

    Journa of The Japan Society for Welfare Engineering ( Japan Society for Welfare Engineering )  17 ( 2 ) 2 - 7   2015.11

    Introduction and explanation (international conference proceedings)   Multiple Authorship

  • Speech/Sound based Human Interfaces (1) Construction of Speech Synthesis Systems using HTS

    Keiichiro Oura, Heiga Zen, Shinji Sako, Keiichi Tokuda

    Human interface ( Human interface Society )  12 ( 1 ) 35 - 40   2010.02  [Refereed]

    Introduction and explanation (international conference proceedings)   Multiple Authorship

Presentations

  • A study on fundamental element extraction using motion capture data for non-manual signals in JSL

    Kenta Yasue, Shinji Sako

    The 81th National Convention of IPSJ  (Fukuoka Univ.)  2019.03  -  2019.03  Information Processing Society of Japan

  • Automatic Melody Composition Using Correspondence to Bassline Based on Music Database and Genetic Algorithm

    Kodai Yamada, Shinji Sako

    The 81th National Convention of IPSJ  (Fukuoka Univ.)  2019.03  -  2019.03  Information Processing Society of Japan

  • Violin Fingering Estimation Effective to Beginner Education

    Juri Watanabe, Shinji Sako

    The 81th National Convention of IPSJ  (Fukuoka Univ.)  2019.03  -  2019.03  Information Processing Society of Japan

  • Music recommendation system using collaborative filtering based on individual knowledge of music

    Kodai Takagi, Shinji Sako

    The 81th National Convention of IPSJ  (Fukuoka Univ.)  2019.03  -  2019.03  Information Processing Society of Japan

  • An estimation of degree of excitement by song audio signals

    Kazuki Fukutani, Shinji Sako

    The 81th National Convention of IPSJ  (Fukuoka Univ.)  2019.03  -  2019.03  Information Processing Society of Japan

  • Development of General Purpose 3D high precision DB for Japanese sign language

    Yuji Nagashima, Shinji Sako, Keiko Watanabe, Daisuke Hara, Yasuo Horiuchi, Akira Ichikawa

    IEICE 99th Technical Committee on Well-being Information Technology (WIT)  ( Ehime Univ.)  2019.02  -  2019.02  Institute of Electronics, Information and Communication Engineers

  • Development of 3D super-high precision DB for use in research on sign language vocabulary and grammar

    Yuji Nagashima, Shinji Sako, Keiko Watanabe, Daisuke Hara, Yasuo Horiuchi, Akira Ichikawa

    2018 Autumn Meeting of ASJ  (Oita University)  2018.09  -  2018.09  Acoustical Society of Japan

  • Development of the Super High-Definition and High-Precision Japanese Sign Language Database Available for Various Research Fields

    Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Keiko Watanabe, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

    Language Resource Workshop 2018  (National Institute for Japanese Language and Linguistics)  2018.09  -  2018.09  National Institute for Japanese Language and Linguistics, Center for Corpus Development

  • A Proposal on a Simplify Input Methodof Motions in Sign Language CG Wiki

    Tatsuya Yamaguchi, Daisuke Muramatsu, Hiroaki Sawano, Norio Ishi, Yuri Suzuki, Shinji Sako

    The 80th National Convention of IPSJ  (Waseda University)  2018.03  -  2018.03  Information Processing Society of Japan

  • Non-word speech recognition by Julius and Chainer

    Toshiharu Tadano, Masahiko Nawate, Hiroshi Ito, Shinji Sako, Kazuo Kadowaki

    17th Forum on Information Technology (FIT2017)  (University of Tokyo, Hongo Campus)  2017.09  -  2017.09  IPSJ, IEICE Information and System Society, IEICE Human Communication Group

display all >>

Work

  • Ryry: Automatic Accompaniment System Capable of Polyphonic Instruments

    Software  2013.03  -  2013.03

Academic Awards Received

  • Japan Society for Fuzzy Theory and Intelligent Informatics Best Paper Award

    2017.09.14    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • 78th National Convention of IPSJ, Student Encouragement Award

    2016.03.11    

    Winner: Naoto Sato, Shinji Sako, Tadashi Kitamura

  • IPSJ Yamashita SIG Research Award

    2016.03    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Ai Zukawa, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Kana Miyata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2013.09    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

  • Forum on Information Technology Encouragement Award 2013

    2013.09    

    Winner: Nagata Wakana, Shinji Sako, Tadashi Kitamura

  • IPSJ Tokai Buranchi, Student Paper Encouragement Award

    2013.05.19    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper presents a method for describing the characteristics of human musical performance. We consider the problem of building models that express the ways in which deviations from a strict interpretations of the score occurs in the performance, and that cluster these deviations automatically. The clustering process is performed using expressive representations unambiguously notated on the musical score, without any arbitrariness by the human observer. The result of clustering is obtained as hierarchical tree structures for each deviational factor that occurred during the operation of the instrument. This structure represents an approximation of the performer's interpretation with information notated on the score they used during the performance. Through validations of applying the method to the data measured from real performances, we show that the use of information regarding expressive representation on the musical score enables the efficient estimation of generative-model for the musical performance. In addition, this method is also useful for objective proof of the existing knowledge about the musical performance by information to support such a knowledge having been shown from our model.

  • Tokai-Section Joint Conference on Electrical and Related Engineering, Encouragement Award

    2013.01.22    

    Winner: Wakana Nagata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2012.12.12    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

display all >>

 
 

Academic Activity

  • 2019.06
    -
    Now

    The Institute of Electronics, Information and Communication Engineers  

  • 2019.01
    -
    Now

    The Institute of Electronics, Information and Communication Engineers  

  • 2018.07
    -
    2019.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2017.07
    -
    2018.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2016.06
    -
    Now

    The Institute of Electronics, Information and Communication Engineers   Language as Real-time Communication, Executive secretary

  • 2015.06
    -
    2019.05

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.06
    -
    2019.05

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2016.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2016.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2015.03

    The Institute of Electronics, Information and Communication Engineers  

display all >>