Continual Vision-Language Retrieval via Dynamic Knowledge Rectification
Authors: Zhenyu Cui, Yuxin Peng, Xun Wang, Manyu Zhu, Jiahuan Zhou
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several benchmark datasets demonstrate the effectiveness of our DKR and its superiority against state-of-the-art methods. In this section, we conduct extensive experiments to validate the effectiveness of our proposed DKR. Ablation Study In this section, we conduct ablation studies under Setting-1 to evaluate the effectiveness of each component of DKR and the effect of the hyperparameter. |
| Researcher Affiliation | Collaboration | Zhenyu Cui1, Yuxin Peng1*, Xun Wang2, Manyu Zhu2, Jiahuan Zhou1 1Wangxuan Institute of Computer Technology, Peking University 2Byte Dance Inc |
| Pseudocode | Yes | Algorithm 1: Dynamic Knowledge Rectification (DKR) |
| Open Source Code | No | The paper does not explicitly state that their source code is available or provide a link to a repository for their DKR implementation. |
| Open Datasets | Yes | 1) MS-COCO Caption: MS-COCO Caption (MS-COCO) (Lin et al. 2014) is a widely used image caption dataset. 2) Flickr30K: Flickr30K (Young et al. 2014) contains 31,783 images from the Flickr website... 3) IAPR TC-12: IAPR TC-12 (Grubinger et al. 2006) consists of 20,000 images... 4) ECommerce-T2I: ECommerce-T2I (EC) (Yang et al. 2021) is a large-scale ecommerce products retrieval dataset. 5) RSICD: RSICD (Lu et al. 2017) is a remote sensing image retrieval dataset... |
| Dataset Splits | Yes | MS-COCO Caption (MS-COCO) (Lin et al. 2014) is a widely used image caption dataset. It contains 80K training images and 5K testing images, where each image has five captions. 2) Flickr30K: Flickr30K (Young et al. 2014) contains 31,783 images from the Flickr website, and each image is annotated by 5 sentences. We use 30K images as the training set and the rest 1K images as the testing set. To further evaluate the performance on the specific dataset, we follow the benchmark in (Ni et al. 2023), which randomly and uniformly divides the EC dataset into 5 sub-datasets, and sequentially train on these 5 sub-datasets. |
| Hardware Specification | Yes | The proposed DKR is implemented in Py Torch with NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number. It also references 'CLIP' but without a specific software version for replication. |
| Experiment Setup | Yes | Each task is trained with 35 epochs with a batch size of 280. We use Adam optimizer with (β1,β2)=(0.9, 0.99) and weight decay of 0.2 to update the whole CLIP. The initial learning rate is set to 1e-6 with 20% warm-up iterations, and a cosine-decay learning rate scheduler is also used to update the whole framework. The hyperparameter λ is set to 1.0 and 0.1 for Setting-1 and Setting-2, respectively. |