Adversarial Cross-Domain Action Recognition with Co-Attention

Authors: Boxiao Pan, Zhangjie Cao, Ehsan Adeli, Juan Carlos Niebles11815-11822

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on three cross-domain action recognition datasets demonstrate that TCo N improves both previous single-domain and crossdomain methods significantly under the cross-domain setting.
Researcher Affiliation Academia Boxiao Pan,* Zhangjie Cao,* Ehsan Adeli, Juan Carlos Niebles Stanford University {bxpan, caozj, eadeli, jniebles}@cs.stanford.edu
Pseudocode No The paper describes the model architecture and optimization steps in narrative text and diagrams (Figure 2, 3, 4), but it does not include a formal pseudocode block or algorithm listing.
Open Source Code No The paper does not contain an explicit statement about releasing the source code for the described methodology or a link to a code repository.
Open Datasets Yes For TSN, C3D and TRN, we train on source and test on target directly. ... We adopt the same data augmentation technique as in (Wang et al. 2016). For C3Dbased models, ... use the same base model (Tran et al. 2015) pre-trained on Sports-1M dataset (Karpathy et al. 2014). and Kuehne, H.; Jhuang, H.; Garrote, E.; Poggio, T.; and Serre, T. 2011. Hmdb: a large video database for human motion recognition. In ICCV. IEEE. and Soomro, K.; Zamir, A. R.; and Shah, M. 2012. Ucf101: A dataset of 101 human actions classes from videos in the wild. ar Xiv preprint ar Xiv:1212.0402.
Dataset Splits Yes For (UCF101-HMDB51)1, (UCF101-HMDB51)2 and UCF50-Olympic Sports, we follow the prior works to construct the datasets by selecting the same action classes in two domains, while for Jester, we merge sub-actions into super-actions and split half of sub-actions into each domain. Please refer to the supplementary material for full details. and For the number of segments, We do grid search for each dataset in [1, minimum video length] on a validation set. Please refer to supplementary material for the actual numbers.
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. It only mentions the software framework PyTorch.
Software Dependencies No The paper mentions 'Py Torch framework (Paszke et al. 2017)' but does not specify its version number or other software dependencies with their respective versions.
Experiment Setup Yes We use the Adam optimizer (Kingma and Ba 2014) and set the batch size to 64. For TSN and TRN-based models, we adopt the BN-Inception (Ioffe and Szegedy 2015) backbone pre-trained on Image Net (Deng et al. 2009). The learning rate is initialized to 0.0003 and decreases by 1 10 every 30 epochs. ... We initialize the learning rate for the feature extractor to 0.001 while 0.01 for classifier since it is trained from scratch. The trade-off parameter λd is increased gradually from 0 to 1 as in DANN (Ganin and Lempitsky 2014). For the number of segments, We do grid search for each dataset in [1, minimum video length] on a validation set.