Curriculum Multi-Negative Augmentation for Debiased Video Grounding
Authors: Xiaohan Lan, Yitian Yuan, Hong Chen, Xin Wang, Zequn Jie, Lin Ma, Zhi Wang, Wenwu Zhu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on newly collected Charades-CD and Activity Net-CD datasets demonstrate our proposed strategy can improve the performance of the base model on both i.i.d and o.o.d scenarios. |
| Researcher Affiliation | Collaboration | 1Tsinghua University 2Meituan Inc. |
| Pseudocode | Yes | Algorithm 1: Multi-stage Curriculum Process |
| Open Source Code | Yes | 1Our codes are available at https://github.com/rubylan/Curri Multi NA |
| Open Datasets | Yes | To prove the effectiveness of our method, we conduct experiments on the newly collected Charades-CD and Activity Net-CD datasets (Yuan et al. 2021). |
| Dataset Splits | Yes | The numbers of videos in train/val/test-iid/testood splits are 4, 564/333/333/1, 442, and the numbers of video-query pairs are 11, 071/859/823/3, 375 respectively. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using I3D, C3D, and GloVe for feature extraction and encoding, but does not specify version numbers for these or any other software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | As for the training strategy setting, we trained 30/20 (i.e., Tmax) epochs for Charades-CD/Activity Net-CD and report results of the epoch whose test-iid set performs the best with metric R@1,Io U=0.7. The batch sizes and learning rates were set to 64/32 and 0.0005/0.0001, respectively. λ{1,2,3} in L were all set to 5.0 for Charades-CD, and set to 15.0 for Activity Net-CD. We adaptively trained the model with the multi-stage curriculum process and set training stage update time T1, T2 and T3 to 3/7/18 and 2/5/13, respectively. As for the model architecture setting, to implement the Multi-NA strategy, we set the mask ratio α to 0.55 and the numbers of per-sample generated samples for each NA type (i.e., N {cc,vc,ss}) to 1 on both datasets. |