Cross-Modal Coherence for Text-to-Image Retrieval
Authors: Malihe Alikhani, Fangda Han, Hareesh Ravi, Mubbasir Kapadia, Vladimir Pavlovic, Matthew Stone10427-10435
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our analysis shows that models trained with image text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherenceagnostic baseline by a huge margin. |
| Researcher Affiliation | Academia | 1University of Pittsburgh 2Rutgers University |
| Pseudocode | No | The paper describes the model architecture and steps but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | Code, contact and data: https://github.com/klory/Cross-Modal Coherence-for-Text-to-Image-Retrieval |
| Open Datasets | Yes | We study the efficacy of CMCM for image-retrieval by leveraging two image-text datasets CITE++ and Clue (Alikhani et al. 2020) that are annotated with image-text coherence relations. |
| Dataset Splits | Yes | We split the CITE++ dataset as 3439/860 for training/testing while the Clue dataset as 6047/1512 for training/testing. 10% of the training data is used as validation. |
| Hardware Specification | No | The paper describes the network details and experimental setup but does not specify any particular GPU models, CPU types, or other hardware used for running experiments. |
| Software Dependencies | No | The paper mentions software components like Resnet50, word2vec, LSTM, and Gensim, but does not provide specific version numbers for these or any other software dependencies needed for replication. |
| Experiment Setup | No | Further training and hyperparameter details are given in the appendix. |