Diversifying Spatial-Temporal Perception for Video Domain Generalization

Authors: Kun-Yu Lin, Jia-Run Du, Yipeng Gao, Jiaming Zhou, Wei-Shi Zheng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three benchmarks of different types demonstrate the effectiveness and versatility of our approach.
Researcher Affiliation Academia 1School of Computer Science and Engineering, Sun Yat-sen University, China 2Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes The code is released at https://github.com/Kunyu Lin/STDN/.
Open Datasets Yes UCF-HMDB is the most widely used benchmark for cross-domain video classification [15, 18], which contains 3,809 videos of 12 overlapping sport categories shared by UCF101 [94] and HMDB51 [95]. EPIC-Kitchens-DG is a cross-scene egocentric action recognition benchmark, which consists of 10,094 videos across 8 egocentric action classes from three domains (scenes), following Munro et al. [49]. Jester-DG is a cross-domain hand gesture recognition benchmark. We select videos from the Jester dataset [97] and construct two domains following Pan et al. [19].
Dataset Splits Yes We split the source video set into training and validation sets following previous source validation protocols [25, 15], i.e., a reasonable in-domain model selection scheme for better generalization ability in unseen target domains.
Hardware Specification Yes All experiments are conducted by Py Torch [99] with four NVIDIA GTX 1080Ti GPUs.
Software Dependencies No The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies.
Experiment Setup Yes We take N = 5 frames for each video for temporal modeling. We set K = 4, τ = 0.5, Ds = 192 and Dt = 256. F( ) is a linear classifier and Frel( ) is an MLP classifier. All parameters are optimized using mini-batch SGD with a batch size of 32, a momentum of 0.9, a learning rate of 1e-3 and a weight decay of 5e-4. By default, the trade-off hyperparameters are set as λent = 0.1 and λrel = 0.5.