Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval
Authors: Tiantian Gong, Junsheng Wang, Liyan Zhang
IJCAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on public datasets, fully demonstrate the consistent superiority of our method over SOTA text-image person retrieval methods. |
| Researcher Affiliation | Academia | Tiantian Gong1 , Junsheng Wang2 , Liyan Zhang1 1Nanjing University of Aeronautics and Astronautics 2Nanjing University of Science and Technology |
| Pseudocode | No | The paper describes the proposed method in detail in Section 3, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain any explicit statements about making the source code available or providing a link to a code repository. |
| Open Datasets | Yes | CUHK-PEDES [Li et al., 2017b] comprises 40,206 pedestrian images along with 80,412 text descriptions corresponding to 13,003 distinct pedestrian identities. ... ICFG-PEDES [Ding et al., 2021] comprises 54,522 images with 4,102 distinct identities. |
| Dataset Splits | Yes | Challenging Data Partitions. We define three distinct settings to represent varying levels of difficulty. For the easy setting, we use 50% of the training set as the complete image-text pair data, 25% as missing image data, and 25% as missing text data, denoted as (50%, 25%, 25%). Similarly, we establish the medium setting, defined as (30%, 35%, 35%), and the hard setting as (10%, 45%, 45%) to elevate the training complexity. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions using 'the image encoder and text encoder components of the Clip [Radford et al., 2021] model' and 'Adam optimizer [Kingma and Ba, 2014]' and NLTK [Loper and Bird, 2002], but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | All images are resized to 384 × 128 pixels. For the text modality, the maximum length of text tokens is set to 80. The model is optimized via the Adam optimizer [Kingma and Ba, 2014] with a 0.0001 learning ratio. The batch size is set to 64, and the training process spans across a total of 60 epochs. The temperature parameter τ (Equations 19 and 24) is set to 0.02. |