SAKO Shinji

写真a

Affiliation Department etc.

Department of Computer Science
Department of Computer Science
Center for Research on Assistive Technology for Building a New Community

Title

Associate Professor

Mail Address

E-mail address

Graduating School

  • 1995.04
    -
    1999.03

    Nagoya Institute of Technology   Faculty of Engineering   Graduated

Graduate School

  • 2001.04
    -
    2004.03

    Nagoya Institute of Technology  Graduate School, Division of Engineering  Department of Electrical & Computer EngineeringDoctor's Course  Completed

External Career

  • 2016.07
    -
    2017.03

    Technical University of Munich   Institute for Human-Machine Communication   Researcher  

  • 2014.07
    -
    2014.08

    AGH University of Science and Technology   Faculty of Computer Science, Electronics and Telecommunications   Guest Scientists  

  • 2012.06
    -
    2012.12

    Technical University Munich   Institute for Human-Machine Communication   Guest Scientists  

  • 2004.04
    -
    2007.03

    The University of Tokyo   Graduate School of Information Science and Technology   Research Assistant  

  • 2003.04
    -
    2003.06

    Advanced Telecommunications Research Institute International  

Academic Society Affiliations

  • 2010.06
    -
    Now

    Japanese Association of Sign Linguistics

  • 2010.06
    -
    Now

    Human Interface Society

  • 2007.10
    -
    Now

    The Institute of Image Information and Television Enginerrs

  • 2005.10
    -
    Now

    The Japanese Society for Artificial Intelligence

  • 2001.03
    -
    Now

    Acoustical Society of Japan

display all >>

Field of expertise (Grants-in-aid for Scientific Research classification)

  • Rehabilitation science/Welfare engineering

  • Kansei informatics

  • Perceptual information processing

 

Thesis for a degree

  • Audio-Visual Speech/Singing-voice Synthesis and Gesture Recognition for Multimodal Human Computer Interaction

    Shinji Sako 

      2004.03

    8   1

Papers

  • Deep CNN-Based Recognition of JSL Finger Spelling

    Nguen Tu Nam, Shinji Sako, Bogdan Kwolek

    Hybrid Artificial Intelligent Systems(HAIS 2019), Lecture Notes in Computer Science ( Springer )  11734   602 - 613   2019.08  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language on RGB images. The finger spelled signs were recognized by an ensemble consisting of a ResNet-based convolutional neural network and two ResNet quaternion convolutional neural networks. A 3D articulated hand model has been used to generate synthetic finger spellings and to extend a dataset consisting of real hand gestures. Twelve different gesture realizations were prepared for each of 41 signs. Ten images have been rendered for each realization through interpolations between the starting and end poses. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. Results achieved by the ResNet quaternion convolutional neural network are better than results obtained by the ResNet CNN. The best recognition results were achieved by the ensemble. The JSL-rend dataset is available for download.

  • Construction of a Japanese Sign Language Database with Various Data Types

    Keiko Watanabe, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Akira Ichikawa

    International Conference on Human-Computer Interaction 2019 ( Springer )  1032   317 - 322   2019.07  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    We have constructed a sign language database which shows 3D animations. We are aiming at constructing an interdisciplinary database which can be used by researchers in various academic fields. This database helps the researchers analyze Japanese sign language. We have recorded nearly 2,000 Japanese signs to now, and we are planning to record on the database approximately 5,000 signs. Firstly, we decided to pick up frequently used Japanese words on the database. Each sign language expression corresponds to the Japanese words is examined. Secondly, we recorded 3D motion data of the determined sign language expressions. We used optical motion capture to record 3D motion data. The data format obtained through motion capture is C3D data, BVH data and FBX data, and frame rate is 120 fps. In addition, we also recorded a full HD video data at 60 fps, super-slow HD data at 30 fps, and depth data at 30 fps, for use in analysis of sign language.

    These are recorded synchronously. In addition, we have developed a new annotation system which can reproduce different types of data synchronously to make the database the most effective. Because it is necessary for data analysis to reproduce synchronously all data, which have been recorded at different frame rates.

  • Discussion of a Japanese sign language database and its annotation systems with consideration for its use in various areas

    Shinji Sako, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Keiko Watanabe, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

    Proceeding of LingCologne 2019     2019.06  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

  • Constructing a Japanese Sign Language Multi-Dimensional Database

    •Yuji Nagashima, Daisuke Hara, Shinji Sako, Keiko Watanabe, Yasuo Horiuchi, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

    The 7th Meeting of Signed and SpokenLanguage Linguistics (SSLL 2018)     2018.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

  • Learning Siamese Features for Finger Spelling Recognition

    Bogdan Kwolek, Shinji Sako

    Advanced Concepts for Intelligent Vision Systems. ACIVS 2017. Lecture Notes in Computer Science, vol 10617 ( Springer )  107617   225 - 236   2017.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    This paper is devoted to finger spelling recognition on the basis of images acquired by a single color camera. The recognition is realized on the basis of learned low-dimensional embeddings. The embeddings are calculated both by single as well as multiple siamese-based convolutional neural networks. We train classifiers operating on such features as well as convolutional neural networks operating on raw images. The evaluations are performed on freely available dataset with finger spellings of Japanese Sign Language. The best results are achieved by a classifier trained on concatenated features of multiple siamese networks.

  • Recognition of JSL finger spelling using convolutional neural networks

    Hosoe Hana, Shinji Sako, Bogdan Kwolek

    15th IAPR International Conference on Machine Vision Applications (MVA) ( IEEE )    85 - 88   2017.07  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Recently, a few methods for recognition of hand postures on depth maps using convolutional neural networks were proposed. In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language. The recognition takes place on the basis of single gray image. The finger spelled signs are recognized using a convolutional neural network. A dataset consisting of 5000 samples has been recorded. A 3D articulated hand model has been designed to generate synthetic finger spellings and to extend the real hand gestures. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. The full dataset and Caffe model are available for download.

  • Japanese Sign Language Recognition Based on Three Elements of Sign Using Kinect v2 Sensor

    Shohei Awata, Shinji Sako, Tadashi Kitamura

    International Conference on Human-Computer Interaction 2017   713   95 - 102   2017.07

    Research paper (international conference proceedings)   Multiple Authorship

    The visual feature of Japanese sign language is divided into two of manual signals and non-manual signals. Manual signals are represented by the shape and motion of the hands, and convey mainly the meaning of sign language words. In terms of phonology, sign language words consist of three elements: hand’s motion, position, and shape. We have developed a recognition system for Japanese sign language (JSL) with abstraction of manual signals based on these three elements. The abstraction of manual signals is performed based on Japanese sign language words dictionary. Features like coordinates of hands and depth images are extracted from manual signals using the depth sensor, Kinect v2. This system recognizes three elements independently and the final result is obtained under the comprehensive judgment from the results of three elements recognition. In this paper, we used two methods for recognition of hand shape, a contour-based method suggested by Keogh and template matching of depth image. The recognition methods of other elements were hidden Markov model for recognition of motion and the normal distribution learned by maximum likelihood estimation for recognition of position, as a same manner of our previous research. Based on our proposal method, we prepared recognition methods of each element and conducted an experiment of 400 sign language words recognition based on a sign language words dictionary.

  • Real-Time Japanese Sign Language Recognition Based on Three Phonological Elements of Sign

    Shinji Sako, Mika Hatano, Tadashi Kitamura

    18th International Conference HCI International 2016, Communications in Computer and Information Science   618   130 - 136   2016.06  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Sign language is the visual language of deaf people. It is also natural language, different in form from spoken language. To resolve a communication barrier between hearing people and deaf, several researches for automatic sign language recognition (ASLR) system are now under way. However, existing research of ASLR deals with only small vocabulary. It is also limited in the environmental conditions and the use of equipment. In addition, compared with the research field of speech recognition, there is no large scale sign database for various reasons. One of the major reasons is that there is no official writing system for Japanese sign Language (JSL). In such a situation, we focused on the use of the knowledge of phonology of JSL and dictionary, in order to develop a develop a real-time JSL sign recognition system. The dictionary consists of over 2,000 JSL sign, each sign defined as three types of phonological elements in JSL: hand shape, motion, and position. Thanks to the use of the dictionary, JSL sign models are represented by the combination of these elements. It also can respond to the expansion of a new sign. Our system employs Kinect v2 sensor to obtain sign features such as hand shape, position, and motion. Depth sensor enables real-time processing and robustness against environmental changes. In general, recognition of hand shape is not easy in the field of ASLR due to the complexity of hand shape. In our research, we apply a contour-based method to hand shape recognition. To recognize hand motion and position, we adopted statistical models such as Hidden Markov models (HMMs) and Gaussian mixture models (GMMs). To address the problem of lack of database, our method utilizes the pseudo motion and hand shape data. We conduct experiments to recognize 223 JSL sign targeted professional sign language interpreters.

  • Automatic Performance Rendering Method for Keyboard Instruments based on Statistical Model that Associates Performance Expression and Musical Notation

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Journal of Japan Society for Fuzzy Theory and Intelligent Informatics ( Japan Society for Fuzzy Theory and Intelligent Informatics )  28 ( 2 ) 557 - 569   2016.04  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • Comparative Analysis of Performance Expression using Similarity Metrics based on Statistical Model and Musical Score Information

    Kenta Okumura, Shinji Sako, Tadashi Kitamura

    Transactions of Japan Society of Kansei Engineering   15 ( 1 ) 255 - 263   2016.02  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

display all >>

Review Papers

  • HMM-based Automatic Sign Language Recognition using Phonemic Structure of Japanese Sign Language

    Shinji Sako, Tadashi Kitamura

    Journa of The Japan Society for Welfare Engineering ( Japan Society for Welfare Engineering )  17 ( 2 ) 2 - 7   2015.11

    Introduction and explanation (international conference proceedings)   Multiple Authorship

  • Speech/Sound based Human Interfaces (1) Construction of Speech Synthesis Systems using HTS

    Keiichiro Oura, Heiga Zen, Shinji Sako, Keiichi Tokuda

    Human interface ( Human interface Society )  12 ( 1 ) 35 - 40   2010.02  [Refereed]

    Introduction and explanation (international conference proceedings)   Multiple Authorship

Presentations

display all >>

Work

  • Ryry: Automatic Accompaniment System Capable of Polyphonic Instruments

    Software  2013.03  -  2013.03

Academic Awards Received

  • Japan Society for Fuzzy Theory and Intelligent Informatics Best Paper Award

    2017.09.14    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • 78th National Convention of IPSJ, Student Encouragement Award

    2016.03.11    

    Winner: Naoto Sato, Shinji Sako, Tadashi Kitamura

  • IPSJ Yamashita SIG Research Award

    2016.03    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Ai Zukawa, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Kana Miyata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2013.09    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

  • Forum on Information Technology Encouragement Award 2013

    2013.09    

    Winner: Nagata Wakana, Shinji Sako, Tadashi Kitamura

  • IPSJ Tokai Buranchi, Student Paper Encouragement Award

    2013.05.19    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper presents a method for describing the characteristics of human musical performance. We consider the problem of building models that express the ways in which deviations from a strict interpretations of the score occurs in the performance, and that cluster these deviations automatically. The clustering process is performed using expressive representations unambiguously notated on the musical score, without any arbitrariness by the human observer. The result of clustering is obtained as hierarchical tree structures for each deviational factor that occurred during the operation of the instrument. This structure represents an approximation of the performer's interpretation with information notated on the score they used during the performance. Through validations of applying the method to the data measured from real performances, we show that the use of information regarding expressive representation on the musical score enables the efficient estimation of generative-model for the musical performance. In addition, this method is also useful for objective proof of the existing knowledge about the musical performance by information to support such a knowledge having been shown from our model.

  • Tokai-Section Joint Conference on Electrical and Related Engineering, Encouragement Award

    2013.01.22    

    Winner: Wakana Nagata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2012.12.12    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

display all >>

 
 

Academic Activity

  • 2019.06
    -
    Now

    The Institute of Electronics, Information and Communication Engineers  

  • 2019.04
    -
    2020.04

    Information Processing Society of Japan  

  • 2019.01
    -
    Now

    The Institute of Electronics, Information and Communication Engineers  

  • 2018.12
    -
    2019.09

    The Institute of Electronics, Information and Communication Engineers  

  • 2018.07
    -
    2019.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2017.07
    -
    2018.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2016.06
    -
    Now

    The Institute of Electronics, Information and Communication Engineers   Language as Real-time Communication, Executive secretary

  • 2015.06
    -
    2019.05

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.06
    -
    2019.05

    The Institute of Electronics, Information and Communication Engineers  

  • 2015.01
    -
    2016.01

    The Institute of Electronics, Information and Communication Engineers  

display all >>