reproducibilityindex.ai

Cross-Modal Coherence for Text-to-Image Retrieval

Authors: Malihe Alikhani, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, Matthew Stone10427-10435

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our analysis shows that models trained with image text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherenceagnostic baseline by a huge margin.
Researcher Affiliation	Academia	1University of Pittsburgh 2Rutgers University
Pseudocode	No	The paper describes the model architecture and steps but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code	Yes	Code, contact and data: https://github.com/klory/Cross-Modal Coherence-for-Text-to-Image-Retrieval
Open Datasets	Yes	We study the efficacy of CMCM for image-retrieval by leveraging two image-text datasets CITE++ and Clue (Alikhani et al. 2020) that are annotated with image-text coherence relations.
Dataset Splits	Yes	We split the CITE++ dataset as 3439/860 for training/testing while the Clue dataset as 6047/1512 for training/testing. 10% of the training data is used as validation.
Hardware Specification	No	The paper describes the network details and experimental setup but does not specify any particular GPU models, CPU types, or other hardware used for running experiments.
Software Dependencies	No	The paper mentions software components like Resnet50, word2vec, LSTM, and Gensim, but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup	No	Further training and hyperparameter details are given in the appendix.