Cross-Modal Coherence for Text-to-Image Retrieval

Authors: Malihe Alikhani, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, Matthew Stone10427-10435

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our analysis shows that models trained with image text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherenceagnostic baseline by a huge margin.
Researcher Affiliation Academia 1University of Pittsburgh 2Rutgers University
Pseudocode No The paper describes the model architecture and steps but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code Yes Code, contact and data: https://github.com/klory/Cross-Modal Coherence-for-Text-to-Image-Retrieval
Open Datasets Yes We study the efficacy of CMCM for image-retrieval by leveraging two image-text datasets CITE++ and Clue (Alikhani et al. 2020) that are annotated with image-text coherence relations.
Dataset Splits Yes We split the CITE++ dataset as 3439/860 for training/testing while the Clue dataset as 6047/1512 for training/testing. 10% of the training data is used as validation.
Hardware Specification No The paper describes the network details and experimental setup but does not specify any particular GPU models, CPU types, or other hardware used for running experiments.
Software Dependencies No The paper mentions software components like Resnet50, word2vec, LSTM, and Gensim, but does not provide specific version numbers for these or any other software dependencies needed for replication.
Experiment Setup No Further training and hyperparameter details are given in the appendix.