Unsupervised Video Domain Adaptation for Action Recognition: A Disentanglement Perspective
Authors: Pengfei Wei, Lingdong Kong, Xinghua Qu, Yi Ren, Zhiqiang Xu, Jing Jiang, Xiang Yin
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on the UCF-HMDB, Jester, and Epic-Kitchens datasets verify the effectiveness and superiority of Tran SVAE compared with several state-of-the-art approaches. |
| Researcher Affiliation | Collaboration | Pengfei Wei AI Lab, Byte Dance pengfei.wei@bytedance.com Lingdong Kong National University of Singapore lingdong@comp.nus.edu.sg Xinghua Qu AI Lab, Byte Dance quxinghua17@gmail.com Yi Ren AI Lab, Byte Dance ren.yi@bytedance.com Zhiqiang Xu MBZUAI zhiqiang.xu@mbzuai.ac.ae Jing Jiang University of Technology Sydney Jing.Jiang@uts.edu.au Xiang Yin AI Lab, Byte Dance yinxiang.stephen@bytedance.com |
| Pseudocode | No | The paper does not include pseudocode or a clearly labeled algorithm block. |
| Open Source Code | Yes | 1Code is publicly available at: https://github.com/ldkong1205/Tran SVAE |
| Open Datasets | Yes | UCF-HMDB is constructed by collecting the relevant and overlapping action classes from UCF101 [35] and HMDB51 [20]. ... Jester [27] consists of 148,092 videos ... Epic-Kitchens [11] is a challenging egocentric dataset ... Sprites [22] contains sequences of animated cartoon characters... |
| Dataset Splits | Yes | UCF-HMDB is constructed by collecting the relevant and overlapping action classes from UCF101 [35] and HMDB51 [20]. It contains 3,209 videos in total with 1,438 training videos and 571 validation videos from UCF101, and 840 training videos and 360 validation videos from HMDB51. |
| Hardware Specification | Yes | NVIDIA A100 GPUs are used for all experiments. |
| Software Dependencies | No | Our Tran SVAE is implemented with Py Torch [30]. |
| Experiment Setup | Yes | We use Adam with a weight decay of 1e 4 as the optimizer. The learning rate is initially set to be 1e 3 and follows a commonly used decreasing strategy in [14]. The batch size and the learning epoch are uniformly set to be 128 and 1,000, respectively, for all the experiments. We uniformly set 100 epochs of training under only source supervision and involved the target pseudo-labels afterward. |