Continual Vision-Language Representation Learning with Off-Diagonal Information
Authors: Zixuan Ni, Longhui Wei, Siliang Tang, Yueting Zhuang, Qi Tian
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on commonly used datasets with different scales and scopes have demonstrated the effectiveness of our method. |
| Researcher Affiliation | Collaboration | 1Zhejiang University 2Huawei Cloud. |
| Pseudocode | No | The paper describes its method using text and mathematical equations (e.g., equations 1-5) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks or figures. |
| Open Source Code | No | The paper mentions 'Open AI source code (Open AI)' in Section 3 for CLIP baseline setup, but it does not include an explicit statement from the authors about releasing their own code for the Mod-X framework or a direct link to their repository. |
| Open Datasets | Yes | MS COCO Captions (Lin et al., 2014): MS COCO Captions (COCO) is a widely used image caption dataset. ... Flickr30K (Young et al., 2014): Flickr30K contains 30K training images... ECommerce-T2I (Yang et al., 2021) is a text-to-image e-commerce dataset... |
| Dataset Splits | Yes | MS COCO Captions (Lin et al., 2014): MS COCO Captions (COCO) is a widely used image caption dataset. It contains 80K training images, 30K validation images, and 5K testing images (COCO(5K)). |
| Hardware Specification | Yes | All of the experiments are conducted on 8 NVIDIA V100 GPUS. |
| Software Dependencies | No | The paper mentions software components like 'Adam W' optimizer and refers to 'Open AI source code' for CLIP, but it does not specify version numbers for programming languages, libraries, or frameworks (e.g., 'PyTorch 1.9', 'Python 3.8'). |
| Experiment Setup | Yes | In exploration experiments 3 and Experiment 5.2, we use the hyper-parameters as be shown in table 3(a). Since the experiment 5.3 based on the pre-training model Vi T-32/B in (Open AI), we set a smaller learning rate from 5e-4 to 1e-6. And other hyper-parameters is consistent with Experiment 5.2 and CLIP (Open AI). |