Commonsense for Zero-Shot Natural Language Video Localization
Authors: Meghana Holla, Ismini Lourentzou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluations on two benchmark datasets, we demonstrate that CORONET surpasses both zero-shot and weakly supervised baselines, achieving improvements up to 32.13% across various recall thresholds and up to 6.33% in m Io U. |
| Researcher Affiliation | Academia | Meghana Holla1, Ismini Lourentzou2, 1Department of Computer Science, Virginia Tech 2School of Information Sciences, University of Illinois at Urbana Champaign |
| Pseudocode | No | The paper describes its model architecture and components in detail through text and diagrams, but it does not include any formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | 1Code available at https://github.com/PLAN-Lab/CORONET |
| Open Datasets | Yes | Consistent with prior zero-shot NLVL research, we evaluate on Charades-STA (Gao et al. 2017) and Activity Net-Captions (Heilbron et al. 2015; Krishna et al. 2017). |
| Dataset Splits | No | The paper states it utilizes video components for training and query/video span annotations for evaluation, and mentions evaluating on benchmark datasets. However, it does not provide specific details regarding the percentages or counts for training, validation, or test splits, nor does it cite a source for predefined splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using an 'off-the-shelf object detector' and a 'fine-tuned language model', but it does not specify any software names with version numbers (e.g., PyTorch version, Python version, specific library versions) that would be needed for reproducibility. |
| Experiment Setup | No | The paper states 'The training objective is Lloc = Ltreg + λLta, where λ is a balancing hyperparameter,' but it does not provide the specific value for λ or other detailed hyperparameters like learning rate, batch size, or optimizer settings necessary to reproduce the experimental setup. |