Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective

Authors: Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of Tran SVAE compared with several state-of-the-art approaches.
Researcher Affiliation Collaboration Pengfei Wei AI Lab, Byte Dance pengfei.wei@bytedance.com Lingdong Kong National University of Singapore lingdong@comp.nus.edu.sg Xinghua Qu AI Lab, Byte Dance quxinghua17@gmail.com Yi Ren AI Lab, Byte Dance ren.yi@bytedance.com Zhiqiang Xu MBZUAI zhiqiang.xu@mbzuai.ac.ae Jing Jiang University of Technology Sydney Jing.Jiang@uts.edu.au Xiang Yin AI Lab, Byte Dance yinxiang.stephen@bytedance.com
Pseudocode No The paper does not include pseudocode or a clearly labeled algorithm block.
Open Source Code Yes 1Code is publicly available at: https://github.com/ldkong1205/Tran SVAE
Open Datasets Yes UCF-HMDB is constructed by collecting the relevant and overlapping action classes from UCF101 [35] and HMDB51 [20]. ... Jester [27] consists of 148,092 videos ... Epic-Kitchens [11] is a challenging egocentric dataset ... Sprites [22] contains sequences of animated cartoon characters...
Dataset Splits Yes UCF-HMDB is constructed by collecting the relevant and overlapping action classes from UCF101 [35] and HMDB51 [20]. It contains 3,209 videos in total with 1,438 training videos and 571 validation videos from UCF101, and 840 training videos and 360 validation videos from HMDB51.
Hardware Specification Yes NVIDIA A100 GPUs are used for all experiments.
Software Dependencies No Our Tran SVAE is implemented with Py Torch [30].
Experiment Setup Yes We use Adam with a weight decay of 1e 4 as the optimizer. The learning rate is initially set to be 1e 3 and follows a commonly used decreasing strategy in [14]. The batch size and the learning epoch are uniformly set to be 128 and 1,000, respectively, for all the experiments. We uniformly set 100 epochs of training under only source supervision and involved the target pseudo-labels afterward.