Multimodal Fusion via Teacher-Student Network for Indoor Action Recognition

Authors: Bruce X.B. Yu, Yan Liu, Keith C.C. Chan3199-3207

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental With extensive experiments on two benchmarking datasets: NTU RGB+D and PKU-MMD, results show that the proposed TSMF consistently performs better than state-of-the-art single modal and multimodal methods.
Researcher Affiliation Academia Bruce X.B. Yu, Yan Liu,* Keith C.C. Chan Department of Computing, The Hong Kong Polytechnic University {csxbyu, csyliu}@comp.polyu.edu.hk, keithccchan@gmail.com
Pseudocode No The paper does not contain any blocks explicitly labeled as 'Pseudocode' or 'Algorithm'.
Open Source Code Yes 1Code is available: https://github.com/bruceyo/TSMF
Open Datasets Yes NTU RGB+D. The NTU RGB+D dataset (Shahroudy et al. 2016) was collected with Kinect v2 sensors PKU-MMD. The PKU-MMD dataset (Liu et al. 2017a) is another popular large dataset collected with Kinect v2.
Dataset Splits Yes We followed the Cross-Subject (CS) and Cross-View (CV) split settings from (Shahroudy et al. 2016) for evaluating our method. Similar with NTU RGB+D, we adopt the two evaluation protocols (i.e., cross-subject and cross-view) recommended in (Liu et al. 2017a).
Hardware Specification Yes All experiments are conducted on a workstation with 4 GTX 1080 Ti GPUs.
Software Dependencies No The paper mentions software components like 'Res Net' and 'Open Pose tool' and 'stochastic gradient descent optimizer' but does not specify their version numbers.
Experiment Setup Yes The initial learning rate is set as 0.1, which is decayed by 0.1 at epochs 10 and 50 and ended at the epoch 80. The minibatch size is set to 64.