CoVR: Learning Composed Video Retrieval from Web Video Captions

Authors: Lucas Ventura, Antoine Yang, Cordelia Schmid, Gül Varol

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments further demonstrate that training a Co VR model on our dataset effectively transfers to Co IR, leading to improved state-of-the-art performance in the zero-shot setup on both the CIRR and Fashion IQ benchmarks.
Researcher Affiliation Academia 1LIGM, École des Ponts, Univ Gustave Eiffel, CNRS, France 2 Inria, ENS, CNRS, PSL Research University, France lucas.ventura@enpc.fr
Pseudocode No The paper describes methods in prose and with figures, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code Yes Our code, datasets, and models are publicly available at https://imagine.enpc.fr/~ventural/covr.
Open Datasets Yes We apply our triplet generation approach to the Web Vid2M dataset (Bain et al. 2021) which contains 2.5M Web-scraped video-caption pairs. Our code, datasets, and models are publicly available at https://imagine.enpc.fr/~ventural/covr.
Dataset Splits Yes The dataset is divided into training, validation, and testing splits with 28225/16742, 4181/2265 and 4148/2178 queries/images, respectively. It is divided into training and validation splits with 18000/45429 and 6016/15415 queries/images, respectively.
Hardware Specification Yes Experiments are conducted on 4 NVIDIA A100-SXM4-80GB GPUs.
Software Dependencies No The paper mentions models used (LLaMA 7B, BLIP, ViT-L), but does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We train our Co VR model on Web Vid-Co VR for 4 epochs with a batch size of 2048 and an initial learning rate of 1e 5. To finetune on CIRR/Fashion IQ, we train for 6 epochs with a batch size of 2048/1024 and an initial learning rate of 1e 4. We set hyperparameters based on the validation curve of Web Vid-Co VR.