reproducibilityindex.ai

Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning

Authors: Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments verify the effectiveness of CLL-CLIP and show that our approach can boost CLL-CLIP, e.g., by 6.7% in text-to-image average Recall@1 on XM3600, and improve various state-of-the-art methods consistently.
Researcher Affiliation	Academia	1 ADSPLAB, School of ECE, Peking University, Shenzhen, China 2 Pengcheng Laboratory, Shenzhen, China {yangbang, chengxx, ywl, asifraza151, zouyx}@pku.edu.cn, chd-dy@foxmail.com
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	Our code and data are available at https://github.com/yangbang18/CLFM.
Open Datasets	Yes	We build a CLL benchmark based on MSCOCO (Chen et al. 2015) and XM3600 (Thapliyal et al. 2022) to evaluate the effectiveness of our proposals. ... We train models on MSCOCO36 based on the Karpathy split (Karpathy and Fei-Fei 2015). ... Table 1: MSCOCO36 (Train/Val/Test Images 113,287/5,000/5,000).
Dataset Splits	Yes	Table 1: MSCOCO36 (Train/Val/Test Images 113,287/5,000/5,000). ... We train models on MSCOCO36 based on the Karpathy split (Karpathy and Fei-Fei 2015). ... The model achieving the highest summation of Recall@{1, 5, 10} on the current-task validation set is selected for training on the next task.
Hardware Specification	Yes	We conduct experiments in Py Torch on a single NVIDIA V100 card and every run of an experiment takes less than 20 hours.
Software Dependencies	No	The paper mentions 'Py Torch' but does not provide a specific version number. No other specific software dependencies with version numbers are listed.
Experiment Setup	Yes	We set the initial temperature of Lcm to 0.07. We search the hyperparameters γ1 and γ2 in Equation (4) from values {1, 0.1, 0.01} and set γ1 = 0.01 and γ2 = 1 based on the AR metric on the validation set. ... For each task, we set the vocab size to 10K. We use batches of 128 samples and Adam W (Loshchilov and Hutter 2019) with L2 weight decay of 0.05 to train models for 3 epochs. We set the learning rate fixed to 5e-5 after 10% warm-up iterations.