SAKO Shinji

写真a

Affiliation Department etc.

Department of Computer Science
Department of Computer Science
Center for Research on Assistive Technology for Building a New Community

Title

Associate Professor

Mail Address

E-mail address

Graduating School

  • 1995.04
    -
    1999.03

    Nagoya Institute of Technology   Faculty of Engineering   Graduated

Graduate School

  • 2001.04
    -
    2004.03

    Nagoya Institute of Technology  Graduate School, Division of Engineering  Department of Electrical & Computer EngineeringDoctor's Course  Completed

External Career

  • 2016.07
    -
    2017.03

    Technical University of Munich   Institute for Human-Machine Communication   Researcher  

  • 2014.07
    -
    2014.08

    AGH University of Science and Technology   Faculty of Computer Science, Electronics and Telecommunications   Guest Scientists  

  • 2012.06
    -
    2012.12

    Technical University Munich   Institute for Human-Machine Communication   Guest Scientists  

  • 2004.04
    -
    2007.03

    The University of Tokyo   Graduate School of Information Science and Technology   Research Assistant  

  • 2003.04
    -
    2003.06

    Advanced Telecommunications Research Institute International  

Academic Society Affiliations

  • 2010.06
    -
    Now

    Japanese Association of Sign Linguistics

  • 2010.06
    -
    Now

    Human Interface Society

  • 2007.10
    -
    Now

    The Institute of Image Information and Television Enginerrs

  • 2005.10
    -
    Now

    The Japanese Society for Artificial Intelligence

  • 2001.03
    -
    Now

    Acoustical Society of Japan

display all >>

Field of expertise (Grants-in-aid for Scientific Research classification)

  • Rehabilitation science/Welfare engineering

  • Kansei informatics

  • Perceptual information processing

 

Thesis for a degree

  • Audio-Visual Speech/Singing-voice Synthesis and Gesture Recognition for Multimodal Human Computer Interaction

    Shinji Sako 

      2004.03

    8   1

Papers

  • 3D skeleton motion generation of double bass from musical score

    Teppei Miura, Shinji Sako

    15th International Symposium on Computer Music Multidisciplinary Research (CMMR)     41 - 46   2021.11  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    In this study, we propose a method for generating 3D skeleton motions of a double bass player from musical score information using a 2-layer LSTM network. Since there is no suitable dataset for this study, we have created a new motion dataset with actual double bass performance. The contribution of this paper is to show the effect of combining bowing and fingering information in the generation of performance motion, and to examine the effective model structure in performance generation. Both objective and subjective evaluations showed that the accuracy of generating performance motion for double bass can be improved using two types of additional information (bowing, fingering information) and improved by constructing a model that takes into account bowing and fingering.

  • SynSLaG: Synthetic Sign Language Generator

    Teppei Miura, Shinji Sako

    ASSETS '21: The 23rd International ACM SIGACCESS Conference on Computers and Accessibility ( Association for Computing Machinery )  ( 90 ) 1 - 4   2021.10  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Machine learning techniques have the potential to play an important role in sign language recognition. However, sign language datasets lack the volume and variety necessary to work well. To enlarge these datasets, we introduce SynSLaG, a tool that synthetically generates sign language datasets from 3D motion capture data. SynSLaG generates realistic images of various body shapes with ground truth 2D/3D poses, depth maps, body-part segmentations, optical flows, and surface normals. The large synthetic datasets provide possibilities for advancing sign language recognition and analysis.

  • Recognition of JSL fingerspelling using Deep Convolutional Neural Networks

    Bogdan Kwolek, Wojciech Baczynski, Shinji Sako

    Neurocomputing     2021.06  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

    In this paper, we present approach for recognition of static fingerspelling in Japanese Sign Language on RGB images. Two 3D articulated hand models have been developed to generate synthetic fingerspellings and to extend a dataset consisting of real hand gestures.In the first approach, advanced graphics techniques were employed to rasterize photorealistic gestures using a skinned hand model. In the second approach, gestures rendered using simpler lighting techniques were post-processed by a modified Generative Adversarial Network. In order to avoid generation of unrealistic fingerspellings a hand segmentation term has been added to the loss function of the GAN. The segmentation of the hand in images with complex background was done by proposed ResNet34-based segmentation network. The finger-spelled signs were recognized by an ensemble with both fine-tuned and trained from scratch neural networks. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on RGB images. The JSL dataset with pixel-level hand segmentations is available for download.

  • Fingerspelling recognition using synthetic images and deep transfer learning

    Nguyen Tu Nam, Shinji Sako, Bogdan Kwolek

    2020 The 13th International Conference on Machine Vision (ICMV 2020)     2020.11  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    Although gesture recognition has been intensely studied for decades, it is still a challenging research topic due to difficulties posed by background complexity, occlusion, viewpoint, lighting changes, the deformable and articulated nature of hands, etc. Numerous studies have shown that extending the training dataset with real images about synthetic images improves the recognition accuracy. However, little work is devoted to demonstrate what improvements in recognition can be achieved thanks to transferring the style onto synthetically generated images from the real gestures. In this paper, we propose a novel method for Japanese fingerspelling recognition using both real and synthetic images generated on the basis of a 3D hand model. We propose to employ a neural style transfer to include information from real images onto synthetically generated dataset. We demonstrate experimentally that neural style transfer and discriminative layer training applied to training deep neural models allow obtaining considerable gains in the recognition accuracy.

  • Study on Effective Combination of Features for Non-word Speech Recognition of Phonological Examination

    Toshiharu Tadano,Masahiko Nawate,Fumihito Ito,Shinji Sako

    IPSJ Journal ( Information Processing Society of Japan )  61 ( 10 ) 1647 - 1657   2020.10  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

    Developmental dyslexia is a main element of learning disability and its early detection is very important for intervention and reading treatment. A convenient screening test using PC has been published and the answer times in text reading, reversed reading of word and mora skip of word are automatically recorded in the test. However, the correctness determination must be done by tester. In order to automate those test, a speech recognition technology corresponding to a non-word that are non-meaningful words used in an examination is necessary, but in conventional speech recognition, recognition precision for non-words is low. Therefore, while reinforcing the function of conventional speech recognition, the accuracy for non-words to be improved to a level that can be practically used for phoneme examination. In this study, we have tried to improve the accuracy for non-words by incorporating a mechanism to determine non-word correctness into Julius, which is in the public domain and can be modified freely. In addition, six candidates are given as feature quantities of speech, and the trend of the accuracy by the combination is examined. As a result, depending on the target non-word, the accuracy was 75.0% to 95.0%, and the overall average value was 87.5%.

  • 3D human pose estimation model using location-maps for distorted and disconnected images by a wearable omnidirectional camera

    Teppei Miura, Shinji Sako

    IPSJ Transactions on Computer Vision and Applications ( Information Processing Society of Japan )  12 ( 4 ) 1 - 17   2020.08  [Refereed]

    Research paper (scientific journal)   Multiple Authorship

    We address a 3D human pose estimation for equirectangular images taken by a wearable omnidirectional camera. The equirectangular image is distorted because the omnidirectional camera is attached closely in front of a person’s neck. Furthermore, some parts of the body are disconnected on the image; for instance, when a hand goes out to an edge of the image, the hand comes in from another edge. The distortion and disconnection of images make 3D pose estimation challenging. To overcome this difficulty, we introduce the location-maps method proposed by Mehta et al.; however, the method was used to estimate 3D human poses only for regular images without distortion and disconnection. We focus on a characteristic of the location-maps that can extend 2D joint locations to 3D positions with respect to 2D-3D consistency without considering kinematic model restrictions and optical properties. In addition, we collect a new dataset that is composed of equirectangular images and synchronized 3D joint positions for training and evaluation. We validate the location-maps’ capability to estimate 3D human poses for distorted and disconnected images. We propose a new location-maps-based model by replacing the backbone network with a state-of-the-art 2D human pose estimation model (HRNet). Our model is a simpler architecture than the reference model proposed by Mehta et al. Nevertheless, our model indicates better performance with respect to accuracy and computation complexity. Finally, we analyze the location-maps method from two perspectives: the map variance and the map scale. Therefore, some location-maps characteristics are revealed that (1) the map variance affects robustness to extend 2D joint locations to 3D positions for the 2D estimation error, and (2) the 3D position accuracy is related to the 2D locations relative accuracy to the map scale.

  • Deep CNN-Based Recognition of JSL Finger Spelling

    Nguen Tu Nam, Shinji Sako, Bogdan Kwolek

    Hybrid Artificial Intelligent Systems(HAIS 2019), Lecture Notes in Computer Science ( Springer )  11734   602 - 613   2019.08  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    In this paper, we present a framework for recognition of static finger spelling in Japanese Sign Language on RGB images. The finger spelled signs were recognized by an ensemble consisting of a ResNet-based convolutional neural network and two ResNet quaternion convolutional neural networks. A 3D articulated hand model has been used to generate synthetic finger spellings and to extend a dataset consisting of real hand gestures. Twelve different gesture realizations were prepared for each of 41 signs. Ten images have been rendered for each realization through interpolations between the starting and end poses. Experimental results demonstrate that owing to sufficient amount of training data a high recognition rate can be attained on images from a single RGB camera. Results achieved by the ResNet quaternion convolutional neural network are better than results obtained by the ResNet CNN. The best recognition results were achieved by the ensemble. The JSL-rend dataset is available for download.

  • Construction of a Japanese Sign Language Database with Various Data Types

    Keiko Watanabe, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Shinji Sako, Akira Ichikawa

    International Conference on Human-Computer Interaction 2019 ( Springer )  1032   317 - 322   2019.07  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

    We have constructed a sign language database which shows 3D animations. We are aiming at constructing an interdisciplinary database which can be used by researchers in various academic fields. This database helps the researchers analyze Japanese sign language. We have recorded nearly 2,000 Japanese signs to now, and we are planning to record on the database approximately 5,000 signs. Firstly, we decided to pick up frequently used Japanese words on the database. Each sign language expression corresponds to the Japanese words is examined. Secondly, we recorded 3D motion data of the determined sign language expressions. We used optical motion capture to record 3D motion data. The data format obtained through motion capture is C3D data, BVH data and FBX data, and frame rate is 120 fps. In addition, we also recorded a full HD video data at 60 fps, super-slow HD data at 30 fps, and depth data at 30 fps, for use in analysis of sign language.

    These are recorded synchronously. In addition, we have developed a new annotation system which can reproduce different types of data synchronously to make the database the most effective. Because it is necessary for data analysis to reproduce synchronously all data, which have been recorded at different frame rates.

  • Discussion of a Japanese sign language database and its annotation systems with consideration for its use in various areas

    Shinji Sako, Yuji Nagashima, Daisuke Hara, Yasuo Horiuchi, Keiko Watanabe, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

    Proceeding of LingCologne 2019     2019.06  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

  • Constructing a Japanese Sign Language Multi-Dimensional Database

    •Yuji Nagashima, Daisuke Hara, Shinji Sako, Keiko Watanabe, Yasuo Horiuchi, Ritsuko Kikusawa, Naoto Kato, Akira Ichikawa

    The 7th Meeting of Signed and SpokenLanguage Linguistics (SSLL 2018)     2018.09  [Refereed]

    Research paper (international conference proceedings)   Multiple Authorship

display all >>

Books

  • Speech Communication and People with Disabilities

    Akira Ichikawa, Yuji Nagashima, Akira Okamoto, Naoto Kato, Shinji Sako, Tetsuya Takiguchi, Daisuke Hara, Michiru Makuuchi (Part: Multiple Authorship ,  Chapter 2 Speech and Communication Disorders )

    Corona Publishing  2021.07 ISBN: 9784339013429

    Chapter 2 Speech and Communication Disorders

Review Papers

  • HMM-based Automatic Sign Language Recognition using Phonemic Structure of Japanese Sign Language

    Shinji Sako, Tadashi Kitamura

    Journa of The Japan Society for Welfare Engineering ( Japan Society for Welfare Engineering )  17 ( 2 ) 2 - 7   2015.11

    Introduction and explanation (international conference proceedings)   Multiple Authorship

  • Speech/Sound based Human Interfaces (1) Construction of Speech Synthesis Systems using HTS

    Keiichiro Oura, Heiga Zen, Shinji Sako, Keiichi Tokuda

    Human interface ( Human interface Society )  12 ( 1 ) 35 - 40   2010.02  [Refereed]

    Introduction and explanation (international conference proceedings)   Multiple Authorship

Presentations

  • Music Mood Recognition Based on Synchronized Audio and Lyrics

    Sho Ikeda, Shinji Sako

    22nd International Society for Music Information Retrieval Conference  (Online)  2021.11  -  2021.11  International Society for Music Information Retrieval

  • Attribute-Aware Deep Music Transformation For Polyphonic Music

    Yuta Matsuoka, Shinji Sako

    22nd International Society for Music Information Retrieval Conference  (Online)  2021.11  -  2021.11  International Society for Music Information Retrieval

  • Dynamics Restoration for "Loud" Popular Music

    Hyuga Ozeki, Shinji Sako

    IPSJ 132th Special Interest Group on MUSic and computer (SIGMUS)  (Online meeting)  2021.09  -  2021.09  Information Processing Society of Japan

  • A study on multi-part beat tracking for mixed music signal with timing discrepancy

    Kazuki Fukutani, Shinji Sako

    IPSJ 131th Special Interest Group on MUSic and computer (SIGMUS)  (Online meeting)  2021.06  -  2021.06  Information Processing Society of Japan

  • A Mobile MoCap System for Sign Language Recognition: Improving Accuracy of 3D Pose Estimation Using OpenPose

    Teppei Miura, Shinji Sako

    IEICE 112th Technical Committee on Well-being Information Technology (WIT)  (Online meeting (Zoom))  2021.06  -  2021.06  Institute of Electronics, Information and Communication Engineers

  • A study on multi-label beat tracking for mixed music signal with timing discrepancy

    Kazuki Fukutani, Shinji Sako

    IPSJ 129th Special Interest Group on MUSic and computer (SIGMUS)  (Online meeting)  2020.11  -  2020.11  Information Processing Society of Japan

  • 20 years achievement and record of Well-being Information Technology (WIT)

    Shinji Sako

    IEICE 105th Technical Committee on Well-being Information Technology (WIT)  (Tsukuba University of Technology, Kasuga campus)  2020.03  -  2020.03  Institute of Electronics, Information and Communication Engineers

  • Automatic melody composition using listening history

    Yuta Matsuoka, Shinji Sako

    The 82th National Convention of IPSJ  (Kanazawa Institute of Technology (Online presentation due to cancellation))  2020.03  -  2020.03  Information Processing Society of Japan

  • End-to-End Audio Source Separation applied to Guitar Extraction

    Hyuga Ozeki, Shinji Sako

    The 82th National Convention of IPSJ  (Kanazawa Institute of Technology (Online presentation due to cancellation))  2020.03  -  2020.03  Information Processing Society of Japan

  • A study on music recommendation system using emotional elements and emotional intensity of speech

    Sho Ikeda, Shinji Sako

    The 82th National Convention of IPSJ  (Kanazawa Institute of Technology (Online presentation due to cancellation))  2020.03  -  2020.03  Information Processing Society of Japan

display all >>

Work

Academic Awards Received

  • Student Encouraging Award

    2021.09.17    

    Winner: Hyuga Ozeki, Shinji Sako

  • Japan Society for Fuzzy Theory and Intelligent Informatics Best Paper Award

    2017.09.14    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper proposes a method for the automatic rendition of performances without losing any characteristics of the specific performer. In many of existing methods, users are required to input expertise such as possessed by the performer. Although they are useful in support of users'own performances, they are not suitable for the purpose of this proposal. The proposed method defines a model that associates the feature quantities of expression extracted from the case of actual performance with its directions that can be surely retrieved from musical score without using expertise. By classifying expressive tendency of the expression of the model for each case of performance using the criteria based on score directions, the rules that elucidate the causal relationship between the performer's specific performance expression and the score directions systematically can be structured. The candidates of the performance cases corresponding to the unseen score directions is obtained by tracing this structure. Dynamic programming is applied to solve the problem of searching the sequence of performance cases with the optimal expression from among these candidates. Objective evaluations indicated that the proposed method is able to efficiently render optimal performances. From subjective evaluations, the quality of rendered expression by the proposed method was confirmed. It was also shown that the characteristics of the performer could be reproduced even in various compositions. Furthermore, performances rendered via the proposed method have won the first prize in the autonomous section of a performance rendering contest for computer systems.

  • 78th National Convention of IPSJ, Student Encouragement Award

    2016.03.11    

    Winner: Naoto Sato, Shinji Sako, Tadashi Kitamura

  • IPSJ Yamashita SIG Research Award

    2016.03    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Ai Zukawa, Shinji Sako, Tadashi Kitamura

  • 76th National Convention of IPSJ, Student Encouragement Award

    2014.03.13    

    Winner: Kana Miyata, Shinji Sako, Tadashi Kitamura

  • Acoustical Society of Japan, Tokai Buranchi, Best Presentation Award

    2013.09    

    Winner: Ryuichi Yamamoto, Shinji Sako, Tadashi Kitamura

  • Forum on Information Technology Encouragement Award 2013

    2013.09    

    Winner: Nagata Wakana, Shinji Sako, Tadashi Kitamura

  • IPSJ Tokai Buranchi, Student Paper Encouragement Award

    2013.05.19    

    Winner: Kenta Okumura, Shinji Sako, Tadashi Kitamura

    This paper presents a method for describing the characteristics of human musical performance. We consider the problem of building models that express the ways in which deviations from a strict interpretations of the score occurs in the performance, and that cluster these deviations automatically. The clustering process is performed using expressive representations unambiguously notated on the musical score, without any arbitrariness by the human observer. The result of clustering is obtained as hierarchical tree structures for each deviational factor that occurred during the operation of the instrument. This structure represents an approximation of the performer's interpretation with information notated on the score they used during the performance. Through validations of applying the method to the data measured from real performances, we show that the use of information regarding expressive representation on the musical score enables the efficient estimation of generative-model for the musical performance. In addition, this method is also useful for objective proof of the existing knowledge about the musical performance by information to support such a knowledge having been shown from our model.

  • Tokai-Section Joint Conference on Electrical and Related Engineering, Encouragement Award

    2013.01.22    

    Winner: Wakana Nagata, Shinji Sako, Tadashi Kitamura

display all >>

 
 

Academic Activity

  • 2021.04
    -
    Now

    The Institute of Electronics, Information and Communication Engineers   Language as Real-time Communication, Chairman

  • 2021.04
    -
    Now

    The Institute of Electronics, Information and Communication Engineers   IEICE Well-being Information Technology (WIT), Chairman

  • 2020.03
    -
    Now

    The Institute of Electronics, Information and Communication Engineers  

  • 2019.04
    -
    2021.03

    The Institute of Electronics, Information and Communication Engineers   IEICE Well-being Information Technology (WIT), Vice Chairman

  • 2019.04
    -
    Now

    Information Processing Society of Japan  

  • 2019.01
    -
    2020.02

    The Institute of Electronics, Information and Communication Engineers  

  • 2018.12
    -
    2019.09

    The Institute of Electronics, Information and Communication Engineers  

  • 2018.07
    -
    2019.01

    The Institute of Electronics, Information and Communication Engineers  

  • 2018.04
    -
    Now

    Information Processing Society of Japan   IPSJ Special Interest Group on Music and Computer

  • 2017.07
    -
    2018.01

    The Institute of Electronics, Information and Communication Engineers  

display all >>