Navigating Open Set Scenarios for Skeleton-Based Action Recognition

Authors: Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information. To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as Cross Max an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. Cross Max outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. We will release the benchmark, code, and models to the community.
Researcher Affiliation Collaboration Kunyu Peng1, Cheng Yin1, Junwei Zheng1, Ruiping Liu1, David Schneider1, Jiaming Zhang1, Kailun Yang2,*, M. Saquib Sarfraz1,3, Rainer Stiefelhagen1, Alina Roitberg4 1Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany 2School of Robotics, Hunan University, China 3Mercedes-Benz Tech Innovation, Germany 4Institute for Artificial Intelligence, University of Stuttgart, Germany
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes We will release the benchmark, code, and models to the community. The benchmark and code can be found in https://github.com/KPeng9510/OS-SAR.
Open Datasets Yes This benchmark is derived from three public datasets for action recognition from body pose sequences NTU60 (Shahroudy et al. 2016), NTU120 (Liu et al. 2020), and Toyota Smart Home (Dai et al. 2023) for which we formalize the open-set splits and an evaluation protocol.
Dataset Splits Yes Following the open-set recognition practices in image classification (Lu et al. 2022), we randomly sample sets of unseen classes and compute the averaged performance over five random splits. ... We build on the NTU60 (Shahroudy et al. 2016), NTU120 (Liu et al. 2020), and Toyota Smart Home (Dai et al. 2023) datasets for human action recognition from body pose sequences and adapt their splits to suit open set conditions.
Hardware Specification Yes Our method relies on Py Torch1.8.0 and is trained with SGD optimizer with learning rate (lr) 0.1, step-wise lr scheduler with decay rate 0.1, steps for decay at {35, 55, 70}, weight decay 0.0004, and batch size 64 for 100 epochs on 4 Nvidia A100 GPUs with Intel Xeon Gold 6230 processor.
Software Dependencies Yes Our method relies on Py Torch1.8.0 and is trained with SGD optimizer with learning rate (lr) 0.1, step-wise lr scheduler with decay rate 0.1, steps for decay at {35, 55, 70}, weight decay 0.0004, and batch size 64 for 100 epochs on 4 Nvidia A100 GPUs with Intel Xeon Gold 6230 processor.
Experiment Setup Yes Our method relies on Py Torch1.8.0 and is trained with SGD optimizer with learning rate (lr) 0.1, step-wise lr scheduler with decay rate 0.1, steps for decay at {35, 55, 70}, weight decay 0.0004, and batch size 64 for 100 epochs on 4 Nvidia A100 GPUs with Intel Xeon Gold 6230 processor. λ, Nk, and α are chosen as 0.1, 5, and 2.0, respectively.