reproducibilityindex.ai

Continual Vision-Language Retrieval via Dynamic Knowledge Rectification

Authors: Zhenyu Cui, Yuxin Peng, Xun Wang, Manyu Zhu, Jiahuan Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on several benchmark datasets demonstrate the effectiveness of our DKR and its superiority against state-of-the-art methods. In this section, we conduct extensive experiments to validate the effectiveness of our proposed DKR. Ablation Study In this section, we conduct ablation studies under Setting-1 to evaluate the effectiveness of each component of DKR and the effect of the hyperparameter.
Researcher Affiliation	Collaboration	Zhenyu Cui1, Yuxin Peng1*, Xun Wang2, Manyu Zhu2, Jiahuan Zhou1 1Wangxuan Institute of Computer Technology, Peking University 2Byte Dance Inc
Pseudocode	Yes	Algorithm 1: Dynamic Knowledge Rectiﬁcation (DKR)
Open Source Code	No	The paper does not explicitly state that their source code is available or provide a link to a repository for their DKR implementation.
Open Datasets	Yes	1) MS-COCO Caption: MS-COCO Caption (MS-COCO) (Lin et al. 2014) is a widely used image caption dataset. 2) Flickr30K: Flickr30K (Young et al. 2014) contains 31,783 images from the Flickr website... 3) IAPR TC-12: IAPR TC-12 (Grubinger et al. 2006) consists of 20,000 images... 4) ECommerce-T2I: ECommerce-T2I (EC) (Yang et al. 2021) is a large-scale ecommerce products retrieval dataset. 5) RSICD: RSICD (Lu et al. 2017) is a remote sensing image retrieval dataset...
Dataset Splits	Yes	MS-COCO Caption (MS-COCO) (Lin et al. 2014) is a widely used image caption dataset. It contains 80K training images and 5K testing images, where each image has ﬁve captions. 2) Flickr30K: Flickr30K (Young et al. 2014) contains 31,783 images from the Flickr website, and each image is annotated by 5 sentences. We use 30K images as the training set and the rest 1K images as the testing set. To further evaluate the performance on the speciﬁc dataset, we follow the benchmark in (Ni et al. 2023), which randomly and uniformly divides the EC dataset into 5 sub-datasets, and sequentially train on these 5 sub-datasets.
Hardware Specification	Yes	The proposed DKR is implemented in Py Torch with NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify a version number. It also references 'CLIP' but without a specific software version for replication.
Experiment Setup	Yes	Each task is trained with 35 epochs with a batch size of 280. We use Adam optimizer with (β1,β2)=(0.9, 0.99) and weight decay of 0.2 to update the whole CLIP. The initial learning rate is set to 1e-6 with 20% warm-up iterations, and a cosine-decay learning rate scheduler is also used to update the whole framework. The hyperparameter λ is set to 1.0 and 0.1 for Setting-1 and Setting-2, respectively.