Curriculum Multi-Negative Augmentation for Debiased Video Grounding

Authors: Xiaohan Lan, Yitian Yuan, Hong Chen, Xin Wang, Zequn Jie, Lin Ma, Zhi Wang, Wenwu Zhu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on newly collected Charades-CD and Activity Net-CD datasets demonstrate our proposed strategy can improve the performance of the base model on both i.i.d and o.o.d scenarios.
Researcher Affiliation Collaboration 1Tsinghua University 2Meituan Inc.
Pseudocode Yes Algorithm 1: Multi-stage Curriculum Process
Open Source Code Yes 1Our codes are available at https://github.com/rubylan/Curri Multi NA
Open Datasets Yes To prove the effectiveness of our method, we conduct experiments on the newly collected Charades-CD and Activity Net-CD datasets (Yuan et al. 2021).
Dataset Splits Yes The numbers of videos in train/val/test-iid/testood splits are 4, 564/333/333/1, 442, and the numbers of video-query pairs are 11, 071/859/823/3, 375 respectively.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using I3D, C3D, and GloVe for feature extraction and encoding, but does not specify version numbers for these or any other software dependencies required to replicate the experiments.
Experiment Setup Yes As for the training strategy setting, we trained 30/20 (i.e., Tmax) epochs for Charades-CD/Activity Net-CD and report results of the epoch whose test-iid set performs the best with metric R@1,Io U=0.7. The batch sizes and learning rates were set to 64/32 and 0.0005/0.0001, respectively. λ{1,2,3} in L were all set to 5.0 for Charades-CD, and set to 15.0 for Activity Net-CD. We adaptively trained the model with the multi-stage curriculum process and set training stage update time T1, T2 and T3 to 3/7/18 and 2/5/13, respectively. As for the model architecture setting, to implement the Multi-NA strategy, we set the mask ratio α to 0.55 and the numbers of per-sample generated samples for each NA type (i.e., N {cc,vc,ss}) to 1 on both datasets.