Commonsense for Zero-Shot Natural Language Video Localization

Authors: Meghana Holla, Ismini Lourentzou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in m Io U.
Researcher Affiliation Academia Meghana Holla1, Ismini Lourentzou2, 1Department of Computer Science, Virginia Tech 2School of Information Sciences, University of Illinois at Urbana Champaign
Pseudocode No The paper describes its model architecture and components in detail through text and diagrams, but it does not include any formal pseudocode blocks or algorithms labeled as such.
Open Source Code Yes 1Code available at https://github.com/PLAN-Lab/CORONET
Open Datasets Yes Consistent with prior zero-shot NLVL research, we evaluate on Charades-STA (Gao et al. 2017) and Activity Net-Captions (Heilbron et al. 2015; Krishna et al. 2017).
Dataset Splits No The paper states it utilizes video components for training and query/video span annotations for evaluation, and mentions evaluating on benchmark datasets. However, it does not provide specific details regarding the percentages or counts for training, validation, or test splits, nor does it cite a source for predefined splits.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models, memory, or cloud computing instances.
Software Dependencies No The paper mentions using an 'off-the-shelf object detector' and a 'fine-tuned language model', but it does not specify any software names with version numbers (e.g., PyTorch version, Python version, specific library versions) that would be needed for reproducibility.
Experiment Setup No The paper states 'The training objective is Lloc = Ltreg + λLta, where λ is a balancing hyperparameter,' but it does not provide the specific value for λ or other detailed hyperparameters like learning rate, batch size, or optimizer settings necessary to reproduce the experimental setup.