Navigating Open Set Scenarios for Skeleton-Based Action Recognition
Authors: Kunyu Peng, Cheng Yin, Junwei Zheng, Ruiping Liu, David Schneider, Jiaming Zhang, Kailun Yang, M. Saquib Sarfraz, Rainer Stiefelhagen, Alina Roitberg
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We assess the performance of seven established open-set approaches on our task and identify their limits and critical generalization issues when dealing with skeleton information. To address these challenges, we propose a distance-based cross-modality ensemble method that leverages the cross-modal alignment of skeleton joints, bones, and velocities to achieve superior open-set recognition performance. We refer to the key idea as Cross Max an approach that utilizes a novel cross-modality mean max discrepancy suppression mechanism to align latent spaces during training and a cross-modality distance-based logits refinement method during testing. Cross Max outperforms existing approaches and consistently yields state-of-the-art results across all datasets and backbones. We will release the benchmark, code, and models to the community. |
| Researcher Affiliation | Collaboration | Kunyu Peng1, Cheng Yin1, Junwei Zheng1, Ruiping Liu1, David Schneider1, Jiaming Zhang1, Kailun Yang2,*, M. Saquib Sarfraz1,3, Rainer Stiefelhagen1, Alina Roitberg4 1Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany 2School of Robotics, Hunan University, China 3Mercedes-Benz Tech Innovation, Germany 4Institute for Artificial Intelligence, University of Stuttgart, Germany |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We will release the benchmark, code, and models to the community. The benchmark and code can be found in https://github.com/KPeng9510/OS-SAR. |
| Open Datasets | Yes | This benchmark is derived from three public datasets for action recognition from body pose sequences NTU60 (Shahroudy et al. 2016), NTU120 (Liu et al. 2020), and Toyota Smart Home (Dai et al. 2023) for which we formalize the open-set splits and an evaluation protocol. |
| Dataset Splits | Yes | Following the open-set recognition practices in image classification (Lu et al. 2022), we randomly sample sets of unseen classes and compute the averaged performance over five random splits. ... We build on the NTU60 (Shahroudy et al. 2016), NTU120 (Liu et al. 2020), and Toyota Smart Home (Dai et al. 2023) datasets for human action recognition from body pose sequences and adapt their splits to suit open set conditions. |
| Hardware Specification | Yes | Our method relies on Py Torch1.8.0 and is trained with SGD optimizer with learning rate (lr) 0.1, step-wise lr scheduler with decay rate 0.1, steps for decay at {35, 55, 70}, weight decay 0.0004, and batch size 64 for 100 epochs on 4 Nvidia A100 GPUs with Intel Xeon Gold 6230 processor. |
| Software Dependencies | Yes | Our method relies on Py Torch1.8.0 and is trained with SGD optimizer with learning rate (lr) 0.1, step-wise lr scheduler with decay rate 0.1, steps for decay at {35, 55, 70}, weight decay 0.0004, and batch size 64 for 100 epochs on 4 Nvidia A100 GPUs with Intel Xeon Gold 6230 processor. |
| Experiment Setup | Yes | Our method relies on Py Torch1.8.0 and is trained with SGD optimizer with learning rate (lr) 0.1, step-wise lr scheduler with decay rate 0.1, steps for decay at {35, 55, 70}, weight decay 0.0004, and batch size 64 for 100 epochs on 4 Nvidia A100 GPUs with Intel Xeon Gold 6230 processor. λ, Nk, and α are chosen as 0.1, 5, and 2.0, respectively. |