Contrastive Masked Autoencoders for Self-Supervised Video Hashing

Authors: Yuting Wang, Jinpeng Wang, Bin Chen, Ziyun Zeng, Shu-Tao Xia

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three large-scale video datasets (i.e., FCVID, Activity Net and YFCC) indicate that Con MH achieves stateof-the-art results.
Researcher Affiliation Academia 1 Tsinghua Shenzhen International Graduate School, Tsinghua University 2 Harbin Institute of Technology, Shenzhen 3 Research Center of Artificial Intelligence, Peng Cheng Laboratory
Pseudocode No The paper describes the Con MH framework and its components in detail but does not include any pseudocode or algorithm blocks.
Open Source Code Yes Code is available at https://github.com/ huangmozhi9527/Con MH.
Open Datasets Yes We evaluate our Con MH on three large-scale video datasets FCVID (Jiang et al. 2017), Activity Net (Caba Heilbron et al. 2015) and YFCC (Thomee et al. 2015).
Dataset Splits Yes FCVID contains 91,223 videos in 239 categories. Following (Song et al. 2018), we use 45,585 videos among FCVID for training and 45,600 videos for retrieval database and queries. Activity Net covers 200 categories of various human activities. Due to the lack of original test labels, we use 9,722 videos as our training set and the validation set as our test set. Similar to (Li et al. 2021), we uniformly sample 1,000 videos in 200 categories as queries and the remaining 3,758 videos as retrieval database. YFCC contains 0.8M videos in 80 categories, which is one of the largest public video datasets available in the real world. We use 409,788 videos for training and 101,256 videos for testing. We randomly choose 1,000 videos with non-zero labels among the testing set as queries and the remaining ones as our retrieval database.
Hardware Specification Yes Our model is implemented in Pytorch with an Nvidia RTX2080Ti GPU.
Software Dependencies No The paper mentions 'Pytorch' but does not specify a version number. No other software or library versions are provided.
Experiment Setup Yes We set the batchsize as 512, masking ratio as 0.75, α as 1.0, τ as 0.5, ρ as 0.1, and train our model for 800 epochs, 500 epochs, and 40 epochs for FCVID, Activity Net and YFCC respectively. We set the initial learning rate as 0.0001, and decay it to 90% every 20 epochs with a minimal learning rate of 0.00001. We optimize our model by using Adam optimizer algorithm (Kingma and Ba 2014).