reproducibilityindex.ai

CoVR: Learning Composed Video Retrieval from Web Video Captions

Authors: Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments further demonstrate that training a Co VR model on our dataset effectively transfers to Co IR, leading to improved state-of-the-art performance in the zero-shot setup on both the CIRR and Fashion IQ benchmarks.
Researcher Affiliation	Academia	1LIGM, École des Ponts, Univ Gustave Eiffel, CNRS, France 2 Inria, ENS, CNRS, PSL Research University, France lucas.ventura@enpc.fr
Pseudocode	No	The paper describes methods in prose and with figures, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Our code, datasets, and models are publicly available at https://imagine.enpc.fr/~ventural/covr.
Open Datasets	Yes	We apply our triplet generation approach to the Web Vid2M dataset (Bain et al. 2021) which contains 2.5M Web-scraped video-caption pairs. Our code, datasets, and models are publicly available at https://imagine.enpc.fr/~ventural/covr.
Dataset Splits	Yes	The dataset is divided into training, validation, and testing splits with 28225/16742, 4181/2265 and 4148/2178 queries/images, respectively. It is divided into training and validation splits with 18000/45429 and 6016/15415 queries/images, respectively.
Hardware Specification	Yes	Experiments are conducted on 4 NVIDIA A100-SXM4-80GB GPUs.
Software Dependencies	No	The paper mentions models used (LLaMA 7B, BLIP, ViT-L), but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	We train our Co VR model on Web Vid-Co VR for 4 epochs with a batch size of 2048 and an initial learning rate of 1e 5. To finetune on CIRR/Fashion IQ, we train for 6 epochs with a batch size of 2048/1024 and an initial learning rate of 1e 4. We set hyperparameters based on the validation curve of Web Vid-Co VR.