reproducibilityindex.ai

QUEST: Quadruple Multimodal Contrastive Learning with Constraints and Self-Penalization

Authors: Qi Song, Tianxiang Gong, Shiqi Gao, Haoyi Zhou, Jianxin Li

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on multiple datasets show that our method achieves superior performance in multimodal contrastive learning benchmarks.
Researcher Affiliation	Academia	Qi Song1 , Tianxiang Gong2 , Shiqi Gao2, Haoyi Zhou1,3 , Jianxin Li2,3 1School of Software, Beihang University 2School of Computer Science and Engineering, Beihang University 3Zhongguancun Laboratory, Beijing {songqi23, gongtx, gaoshiqi, haoyi, lijx}@buaa.edu
Pseudocode	Yes	Algorithm 1 LUIC loss calculation; Algorithm 2 Calculate similarity map; Algorithm 3 Lcos loss calculation
Open Source Code	Yes	We provide source code of our paper. 2 https://github.com/Vortexsong/QUEST
Open Datasets	Yes	Flickr30k is a benchmark commonly used in computer vision (CV) and natural language processing (NLP)... Microsoft Common Objects in Context (MS-COCO) is a large-scale dataset... Free Music Archive (FMA) is an extensive, open-access dataset... GTZAN is a benchmark dataset widely used in Music Information Retrieval (MIR)... Clotho: an audio captioning dataset... Audio Caps is a seminal dataset for audio captioning...
Dataset Splits	Yes	FMA s comprehensive nature makes it ideal for various MIR tasks such as genre classiﬁcation, artist identiﬁcation, and music recommendation, while its predeﬁned train/validation/test splits and subsets of varying sizes facilitate reproducible research and benchmarking in the ﬁeld.
Hardware Specification	Yes	All experiments in this paper are run on a single NVIDIA A100 GPU.
Software Dependencies	Yes	The implementation is based on Py Torch 2.0.1.
Experiment Setup	Yes	Table 3: Multimodal Model Training Details. ... VSE++ 30 128 adam 2e-4 0 step LR ... CLIP 5 256 adamw 2e-5 100 cosine_annealing. ... We choose the hyperparameters alpha_t as 0.08 on most experiments and set positive_sample to false.