reproducibilityindex.ai

Diversifying Spatial-Temporal Perception for Video Domain Generalization

Authors: Kun-Yu Lin, Jia-Run Du, Yipeng Gao, Jiaming Zhou, Wei-Shi Zheng

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on three benchmarks of different types demonstrate the effectiveness and versatility of our approach.
Researcher Affiliation	Academia	1School of Computer Science and Engineering, Sun Yat-sen University, China 2Key Laboratory of Machine Intelligence and Advanced Computing, Ministry of Education, China
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code is released at https://github.com/Kunyu Lin/STDN/.
Open Datasets	Yes	UCF-HMDB is the most widely used benchmark for cross-domain video classiﬁcation [15, 18], which contains 3,809 videos of 12 overlapping sport categories shared by UCF101 [94] and HMDB51 [95]. EPIC-Kitchens-DG is a cross-scene egocentric action recognition benchmark, which consists of 10,094 videos across 8 egocentric action classes from three domains (scenes), following Munro et al. [49]. Jester-DG is a cross-domain hand gesture recognition benchmark. We select videos from the Jester dataset [97] and construct two domains following Pan et al. [19].
Dataset Splits	Yes	We split the source video set into training and validation sets following previous source validation protocols [25, 15], i.e., a reasonable in-domain model selection scheme for better generalization ability in unseen target domains.
Hardware Specification	Yes	All experiments are conducted by Py Torch [99] with four NVIDIA GTX 1080Ti GPUs.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide specific version numbers for software dependencies.
Experiment Setup	Yes	We take N = 5 frames for each video for temporal modeling. We set K = 4, τ = 0.5, Ds = 192 and Dt = 256. F( ) is a linear classiﬁer and Frel( ) is an MLP classiﬁer. All parameters are optimized using mini-batch SGD with a batch size of 32, a momentum of 0.9, a learning rate of 1e-3 and a weight decay of 5e-4. By default, the trade-off hyperparameters are set as λent = 0.1 and λrel = 0.5.