Two-Stream Convolution Augmented Transformer for Human Activity Recognition

Authors: Bing Li, Wei Cui, Wei Wang, Le Zhang, Zhenghua Chen, Min Wu286-293

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on four real experiment datasets demonstrate that our model outperforms state-of-the-art models in terms of both effectiveness and efficiency.
Researcher Affiliation Collaboration 1School of Computer Science and Engineering, University of New South Wales, Australia 2Institute for Infocomm Research, Agency for Science, Technology and Research (ASTAR), Singapore 3Dongguan University of Technology, China
Pseudocode No The paper does not contain any clearly labeled "Pseudocode" or "Algorithm" blocks.
Open Source Code Yes 1Code: https://github.com/windofshadow/THAT
Open Datasets Yes We used four datasets for evaluation, i.e., Office Room4, Activity Room, Meeting Room, and Activity+Meeting. The first dataset is publicly available, and the last three are collected by our prototype system. ... 4https://github.com/ermongroup/Wifi_Activity_Recognition
Dataset Splits Yes All datasets adopt 8:1:1 train/dev/test split.
Hardware Specification Yes The model was implemented using Pytorch 1.4 with Python 3.6, and trained on a Nvidia 1080Ti GPU.
Software Dependencies Yes The model was implemented using Pytorch 1.4 with Python 3.6, and trained on a Nvidia 1080Ti GPU.
Experiment Setup Yes The strides of average pooling were set to 4 for temporal stream and 3 for channel stream. The number of Gaussian distributions was K = 10. The µs were evenly distributed among temporal dimension, namely, beginning from 25 and ending with 475 with the step 50. All σs were set to 8. The number of stacks for temporal module was H = 5 and channel module was N = 1; The dimensionality din = dk = do was set to 90 and 500, the number of heads h were set to 9 and 200, and dv = do/h. The dropout rate was 0.1. For temporal module and channel module, the kernel sizes were {1, 3, 5} and {1, 2, 3}, the hidden dimensions dh were 360 and 4000. For temporal and channel module, the kernel numbers w were set to 128 and 16, and the kernel sizes l were set to {10, 40} and {2, 4}. The dropout rate for this layer was set to 0.5. For optimization, we used Adam (Kingma and Ba 2014) with an initial learning rate 0.001. All weight parameters were initialized using Xavier (Glorot and Bengio 2010). All datasets adopt 8:1:1 train/dev/test split. The batch size was 16. We ran the model for a maximum of 50 epochs and selected the best on validation set for testing.