Diversifying Spatial-Temporal Perception for Video Domain Generalization
Authors: Kun-Yu Lin, Jia-Run Du, Yipeng Gao, Jiaming Zhou, Wei-Shi Zheng
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on three benchmarks of different types demonstrate the effectiveness and versatility of our approach. |
| Researcher Affiliation | Academia | 1School of Computer Science and Engineering, Sun Yat-sen University, China 2Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is released at https://github.com/Kunyu Lin/STDN/. |
| Open Datasets | Yes | UCF-HMDB is the most widely used benchmark for cross-domain video classification [15, 18], which contains 3,809 videos of 12 overlapping sport categories shared by UCF101 [94] and HMDB51 [95]. EPIC-Kitchens-DG is a cross-scene egocentric action recognition benchmark, which consists of 10,094 videos across 8 egocentric action classes from three domains (scenes), following Munro et al. [49]. Jester-DG is a cross-domain hand gesture recognition benchmark. We select videos from the Jester dataset [97] and construct two domains following Pan et al. [19]. |
| Dataset Splits | Yes | We split the source video set into training and validation sets following previous source validation protocols [25, 15], i.e., a reasonable in-domain model selection scheme for better generalization ability in unseen target domains. |
| Hardware Specification | Yes | All experiments are conducted by Py Torch [99] with four NVIDIA GTX 1080Ti GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies. |
| Experiment Setup | Yes | We take N = 5 frames for each video for temporal modeling. We set K = 4, τ = 0.5, Ds = 192 and Dt = 256. F( ) is a linear classifier and Frel( ) is an MLP classifier. All parameters are optimized using mini-batch SGD with a batch size of 32, a momentum of 0.9, a learning rate of 1e-3 and a weight decay of 5e-4. By default, the trade-off hyperparameters are set as λent = 0.1 and λrel = 0.5. |