Embracing Language Inclusivity and Diversity in CLIP through Continual Language Learning
Authors: Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments verify the effectiveness of CLL-CLIP and show that our approach can boost CLL-CLIP, e.g., by 6.7% in text-to-image average Recall@1 on XM3600, and improve various state-of-the-art methods consistently. |
| Researcher Affiliation | Academia | 1 ADSPLAB, School of ECE, Peking University, Shenzhen, China 2 Pengcheng Laboratory, Shenzhen, China {yangbang, chengxx, ywl, asifraza151, zouyx}@pku.edu.cn, chd-dy@foxmail.com |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code and data are available at https://github.com/yangbang18/CLFM. |
| Open Datasets | Yes | We build a CLL benchmark based on MSCOCO (Chen et al. 2015) and XM3600 (Thapliyal et al. 2022) to evaluate the effectiveness of our proposals. ... We train models on MSCOCO36 based on the Karpathy split (Karpathy and Fei-Fei 2015). ... Table 1: MSCOCO36 (Train/Val/Test Images 113,287/5,000/5,000). |
| Dataset Splits | Yes | Table 1: MSCOCO36 (Train/Val/Test Images 113,287/5,000/5,000). ... We train models on MSCOCO36 based on the Karpathy split (Karpathy and Fei-Fei 2015). ... The model achieving the highest summation of Recall@{1, 5, 10} on the current-task validation set is selected for training on the next task. |
| Hardware Specification | Yes | We conduct experiments in Py Torch on a single NVIDIA V100 card and every run of an experiment takes less than 20 hours. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not provide a specific version number. No other specific software dependencies with version numbers are listed. |
| Experiment Setup | Yes | We set the initial temperature of Lcm to 0.07. We search the hyperparameters γ1 and γ2 in Equation (4) from values {1, 0.1, 0.01} and set γ1 = 0.01 and γ2 = 1 based on the AR metric on the validation set. ... For each task, we set the vocab size to 10K. We use batches of 128 samples and Adam W (Loshchilov and Hutter 2019) with L2 weight decay of 0.05 to train models for 3 epochs. We set the learning rate fixed to 5e-5 after 10% warm-up iterations. |