Papers - TAMAKI Toru

Division display  1 - 20 of about 35 /  All the affair displays >>
  • Can masking background and object reduce static bias for zero-shot action recognition? Reviewed

    Takumi Fukuzawa, Kensho Hara, Hirokatsu Kataoka, Toru Tamaki

    The 31th International Conference on MultiMedia Modeling (MMM2025)   2025.01

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)  

  • Online Pre-Training With Long-Form Videos Reviewed

    Itsuki Kato, Kodai Kamiya, Toru Tamaki

    Proc. of 2024 IEEE 13th Global Conference on Consumer Electronics (GCCE 2024)   2024.10

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.48550/arXiv.2408.15651

    Other Link: https://doi.org/10.48550/arXiv.2408.15651

  • Shift and matching queries for video semantic segmentation

    Tsubasa Mizuno, Toru Tamaki

    arXiv   1 - 12   2024.10

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    DOI: 10.48550/arXiv.2410.07635

    Other Link: https://arxiv.org/abs/2410.07635

  • Query matching for spatio-temporal action detection with query-based object detector

    Shimon Hori, Kazuki Omi, Toru Tamaki

    arXiv   1 - 5   2024.09

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (conference, symposium, etc.)  

    DOI: 10.48550/arXiv.2409.18408

    Other Link: https://arxiv.org/abs/2409.18408

  • Fine-grained length controllable video captioning with ordinal embeddings

    Tomoya Nitta, Takumi Fukuzawa, Toru Tamaki

    arXiv   1 - 29   2024.08

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (other academic)  

    DOI: 10.48550/arXiv.2408.15447

    Other Link: https://arxiv.org/abs/2408.15447

  • セグメンテーションと画像変換を用いた動作認識のためのデータ拡張 Invited

    杉浦大輝, 玉木徹

    画像ラボ   35 ( 6 )   7 - 15   2024.06

     More details

    Authorship:Last author   Language:Japanese   Publishing type:Research paper (scientific journal)  

    Other Link: https://www.nikko-pb.co.jp/products/detail.php?product_id=5773

  • S3Aug: Segmentation, Sampling, and Shift for Action Recognition Reviewed

    Taiki Sugiura, Toru Tamaki

    Computer Vision, Imaging and Computer Graphics Theory and Applications (VISAPP2024)   2   71 - 79   2024.02

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

    Action recognition is a well-established area of research in computer vision. In this paper, we propose S3Aug, a video data augmenatation for action recognition. Unlike conventional video data augmentation methods that involve cutting and pasting regions from two videos, the proposed method generates new videos from a single training video through segmentation and label-to-image transformation. Furthermore, the proposed method modifies certain categories of label images by sampling to generate a variety of videos, and shifts intermediate features to enhance the temporal coherency between frames of the generate videos. Experimental results on the UCF101, HMDB51, and Mimetics datasets demonstrate the effectiveness of the proposed method, paricularlly for out-of-context videos of the Mimetics dataset.

    DOI: 10.5220/0012310400003660

    DOI: 10.5220/0012310400003660

    Other Link: https://www.scitepress.org/Link.aspx?doi=10.5220/0012310400003660

  • Multi-model learning by sequential reading of untrimmed videos for action recognition Reviewed

    Kodai Kamiya, Toru Tamaki

    Proc. of The International Workshop on Frontiers of Computer Vision (IW-FCV2024)   2024.02

     More details

    Authorship:Last author   Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.48550/arXiv.2401.14675

    Other Link: https://doi.org/10.48550/arXiv.2401.14675

  • Toru Tamaki, Daisuke Kayatama, Yongfei Wu, Tetsushi Koide, Shigeto Yoshida, Shin Morimoto, Yuki Okamoto, Shiro Oka, Shinji Tanaka

    Visualization Algorithms of Colorectal NBI Endoscopy Images for Computer-aided Diagnosis

    Proc. of The 8th International Symposium on Biomedical Engineering & International Workshop on Nanodevice Technologies 2023   2023.11

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (conference, symposium, etc.)  

  • A Two-Stage Real Time Diagnosis System for Lesion Recognition in Colon NBI Endoscopy

    Yongfei Wu, Daisuke Katayama, Tetsushi Koide, Toru Tamaki, Shigeto Yoshida, Shin Morimoto, Yuki Okamoto, Shiro Oka, Shinji Tanaka, Masayuki Odagawa, Toshihiko Sugihara

    Proc. of The 8th International Symposium on Biomedical Engineering & International Workshop on Nanodevice Technologies 2023   2023.11

     More details

    Language:English   Publishing type:Research paper (conference, symposium, etc.)  

  • A Lesion Recognition System Using Single FCN for Indicating Detailed Inference Results in Colon NBI Endoscopy

    Yongfei Wu, Daisuke Katayama, Tetsushi Koide, Toru Tamaki, Shigeto Yoshida, Shin Morimoto, Yuki Okamoto, Shiro Oka, Shinji Tanaka, Masayuki Odagawa, Toshihiko Sugihara

    Proc. of The 8th International Symposium on Biomedical Engineering & International Workshop on Nanodevice Technologies 2023   2023.11

     More details

    Authorship:Lead author   Language:English   Publishing type:Research paper (conference, symposium, etc.)  

  • Joint learning of images and videos with a single Vision Transformer Reviewed International journal

    Shuki Shimizu, Toru Tamaki

    Proc. of 18th International Conference on Machine Vision and Applications (MVA)   1 - 6   2023.08

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

    In this study, we propose a method for jointly learning of images and videos using a single model. In general, images and videos are often trained by separate models. We propose in this paper a method that takes a batch of images as input to Vision Transformer (IV-ViT), and also a set of video frames with temporal aggregation by late fusion. Experimental results on two image datasets and two action recognition datasets are presented.

    DOI: 10.23919/MVA57639.2023.10215661

    DOI: 10.23919/MVA57639.2023.10215661

    Other Link: https://ieeexplore.ieee.org/document/10215661/authors#authors

  • 効率的な動作認識のためのシフトによる時間的な相互アテンションを用いたVision Transformer

    橋口凌大, 玉木徹

    画像ラボ   34 ( 5 )   9 - 16   2023.05

     More details

    Authorship:Last author, Corresponding author   Language:Japanese   Publishing type:Research paper (bulletin of university, research institution)  

    効率的な動作認識のために時間的な相互アテンション機構を導入したマルチヘッド自己・相互アテンション(Multi-head Self/Cross-Attention、MSCA)を提案する。これは追加の計算量がなく効率的であり、ViTを時間的に拡張するために適した構造となっている。Kineitcs400を用いた実験により提案手法の有効性と、従来手法に対する優位性を示す。

    Other Link: https://www.nikko-pb.co.jp/products/detail.php?product_id=5529

  • Object-ABN: Learning to Generate Sharp Attention Maps for Action Recognition Reviewed

    Tomoya Nitta, Tsubasa Hirakawa, Hironobu Fujiyoshi, Toru Tamaki

    IEICE Transactions on Information and Systems   E106-D ( 3 )   391 - 400   2023.03

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:The Institute of Electronics, Information and Communication Engineers  

    In this paper we propose an extension of the Attention Branch Network (ABN) by using instance segmentation for generating sharper attention maps for action recognition. Methods for visual explanation such as Grad-CAM usually generate blurry maps which are not intuitive for humans to understand, particularly in recognizing actions of people in videos. Our proposed method, Object-ABN, tackles this issue by introducing a new mask loss that makes the generated attention maps close to the instance segmentation result. Further the Prototype Conformity (PC) loss and multiple attention maps are introduced to enhance the sharpness of the maps and improve the performance of classification. Experimental results with UCF101 and SSv2 shows that the generated maps by the proposed method are much clearer qualitatively and quantitatively than those of the original ABN.

    DOI: 10.1587/transinf.2022EDP7138

    DOI: 10.1587/transinf.2022EDP7138

    Other Link: https://www.jstage.jst.go.jp/article/transinf/E106.D/3/E106.D_2022EDP7138/_article

  • ObjectMix: Data Augmentation by Copy-Pasting Objects in Videos for Action Recognition Reviewed International journal

    Jun Kimata, Tomoya Nitta, Toru Tamaki

    ACM MM 2022 Asia (MMAsia '22)   2022.12

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

    DOI: 10.1145/3551626.3564941

    Other Link: https://www.google.com/url?q=https%3A%2F%2Fdoi.org%2F10.1145%2F3551626.3564941&sa=D&sntz=1&usg=AOvVaw2jqzXXsG8MZbwSm67eCcjm

  • Temporal Cross-attention for Action Recognition Reviewed International journal

    Ryota Hashiguchi, Toru Tamaki

    2022.12

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (international conference proceedings)  

    Feature shifts have been shown to be useful for action recognition with CNN-based models since Temporal Shift Module (TSM) was proposed. It is based on frame-wise feature extraction with late fusion, and layer features are shifted along the time direction for the temporal interaction. TokenShift, a recent model based on Vision Transformer (ViT), also uses the temporal feature shift mechanism, which, however, does not fully exploit the structure of Multi-head Self-Attention (MSA) in ViT. In this paper, we propose Multi-head Self/Cross-Attention (MSCA), which fully utilizes the attention structure. TokenShift is based on a frame-wise ViT with features temporally shifted with successive frames (at time t+1 and t-1). In contrast, the proposed MSCA replaces MSA in the frame-wise ViT, and some MSA heads attend to successive frames instead of the current frame. The computation cost is the same as the frame-wise ViT and TokenShift as it simply changes the target to which the attention is taken. There is a choice about which of key, query, and value are taken from the successive frames, then we experimentally compared these variants with Kinetics400. We also investigate other variants in which the proposed MSCA is used along the patch dimension of ViT, instead of the head dimension. Experimental results show that a variant, MSCA-KV, shows the best performance and is better than TokenShift by 0.1% and then ViT by 1.2%.

    Other Link: https://openaccess.thecvf.com/menu_other.html

  • Model-agnostic Multi-Domain Learning with Domain-Specific Adapters for Action Recognition Reviewed International journal

    Kazuki Omi, Jun Kimata, Toru Tamaki

    IEICE Transactions on Information and Systems   E105-D ( 12 )   2022.12

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEICE  

    In this paper, we propose a multi-domain learning model for action recognition. The proposed method inserts domain-specific adapters between layers of domain-independent layers of a backbone net- work. Unlike a multi-head network that switches classification heads only, our model switches not only the heads, but also the adapters for facilitating to learn feature representations universal to multiple domains. Unlike prior works, the proposed method is model-agnostic and doesn’t assume model structures unlike prior works. Experimental results on three popular action recognition datasets (HMDB51, UCF101, and Kinetics-400) demonstrate that the proposed method is more effective than a multi-head architecture and more efficient than separately training models for each domain.

    DOI: 10.1587/transinf.2022EDP7058

    Other Link: https://search.ieice.org/bin/summary_advpub.php?id=2022EDP7058&category=D&lang=E&abst=

  • 動作行動認識の最前線:手法,タスク,データセット Invited

    玉木徹

    画像応用技術専門委員会 研究会報告   34 ( 4 )   1 - 20   2022.11

     More details

    Authorship:Lead author, Last author, Corresponding author   Language:Japanese   Publishing type:Research paper (conference, symposium, etc.)  

    Other Link: http://www.tc-iaip.org/research/

  • Performance Evaluation of Action Recognition Models on Low Quality Videos Reviewed International journal

    Aoi Otani, Ryota Hashiguchi, Kazuki Omi, Norishige Fukushima, Toru Tamaki

    IEEE Access   10   94898 - 94907   2022.09

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (scientific journal)   Publisher:IEEE  

    In the design of action recognition models, the quality of videos is an important issue; however, the trade-off between the quality and performance is often ignored. In general, action recognition models are trained on high-quality videos, hence it is not known how the model performance degrades when tested on low-quality videos, and how much the quality of training videos affects the performance. The issue of video quality is important, however, it has not been studied so far. The goal of this study is to show the trade-off between the performance and the quality of training and test videos by quantitative performance evaluation of several action recognition models for transcoded videos in different qualities. First, we show how the video quality affects the performance of pre-trained models. We transcode the original validation videos of Kinetics400 by changing quality control parameters of JPEG (compression strength) and H.264/AVC (CRF). Then we use the transcoded videos to validate the pre-trained models. Second, we show how the models perform when trained on transcoded videos. We transcode the original training videos of Kinetics400 by changing the quality parameters of JPEG and H.264/AVC. Then we train the models on the transcoded training videos and validate them with the original and transcoded validation videos. Experimental results with JPEG transcoding show that there is no severe performance degradation (up to −1.5%) for compression strength smaller than 70 where no quality degradation is visually observed, and for larger than 80 the performance degrades linearly with respect to the quality index. Experiments with H.264/AVC transcoding show that there is no significant performance loss (up to −1%) with CRF30 while the total size of video files is reduced to 30%. In summary, the video quality doesn’t have a large impact on the performance of action recognition models unless the quality degradation is severe and visible. This enables us to transcode the tr...

    DOI: 10.1109/ACCESS.2022.3204755

    Other Link: https://ieeexplore.ieee.org/document/9878331

  • Object-ABN: Learning to Generate Sharp Attention Maps for Action Recognition International journal

    Tomoya Nitta, Tsubasa Hirakawa, Hironobu Fujiyoshi, Toru Tamaki

    2022.07

     More details

    Authorship:Last author, Corresponding author   Language:English   Publishing type:Research paper (other academic)  

    In this paper we propose an extension of the Attention Branch Network (ABN) by using instance segmentation for generating sharper attention maps for action recognition. Methods for visual explanation such as Grad-CAM usually generate blurry maps which are not intuitive for humans to understand, particularly in recognizing actions of people in videos. Our proposed method, Object-ABN, tackles this issue by introducing a new mask loss that makes the generated attention maps close to the instance segmentation result. Further the PC loss and multiple attention maps are introduced to enhance the sharpness of the maps and improve the performance of classification. Experimental results with UCF101 and SSv2 shows that the generated maps by the proposed method are much clearer qualitatively and quantitatively than those of the original ABN.

    DOI: 10.48550/arXiv.2207.13306

    Other Link: https://doi.org/10.48550/arXiv.2207.13306

To the head of this page.▲