Proposal-Free Video Grounding with Contextual Pyramid Network

Authors: Kun Li, Dan Guo, Meng Wang1902-1910

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Activity Net Captions, Charades-STA, and TACo S datasets demonstrate that our approach outperforms state-of-the-art methods.
Researcher Affiliation Academia Key Laboratory of Knowledge Engineering with Big Data (HFUT), Ministry of Education School of Computer Science and Information Engineering, Hefei University of Technology (HFUT) School of Artificial Intelligence Intelligent Interconnected Systems Laboratory of Anhui Province (Hefei University of Technology)
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any statement or link regarding the availability of open-source code for the described methodology.
Open Datasets Yes 1) Charades-STA (Gao et al. 2017)... is split into training and testing parts with 12,408 pairs and 3,720 pairs, respectively. 2) Activity Net-Captions (Krishna et al. 2017)... the dataset is split into the training/validation/testing sets of 37,421, 17,505, and 17,031 query-clip pairs. 3) TACo S (Regneri et al. 2013)... There are 10146, 4589, and 4083 query-clip pairs for training, validation, and testing, respectively.
Dataset Splits Yes Activity Net-Captions (Krishna et al. 2017)... the dataset is split into the training/validation/testing sets of 37,421, 17,505, and 17,031 query-clip pairs. 3) TACo S (Regneri et al. 2013)... There are 10146, 4589, and 4083 query-clip pairs for training, validation, and testing, respectively.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU models, memory) used for running its experiments.
Software Dependencies No The paper mentions optimizers (Adam) and pre-trained models (C3D, I3D, GloVe), but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch/TensorFlow versions).
Experiment Setup Yes The transformed dimension d of the feature encoding phase is set to 512. In the implementation of simplified QANet, the kernel size and the layer number of depthwise convolutions are set to 15 and 4, respectively; the head of multi-head self-attention is set to 8. We optimize the network by Adam optimizer (Kingma and Ba 2015) with a batch size of 100 and set the initial learning rate to 1e-4 and gradient clipping of 0.5.