Enhancing Cross-modal Completion and Alignment for Unsupervised Incomplete Text-to-Image Person Retrieval
Authors: Tiantian Gong, Junsheng Wang, Liyan Zhang
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on public datasets, fully demonstrate the consistent superiority of our method over SOTA text-image person retrieval methods. |
| Researcher Affiliation | Academia | Tiantian Gong1 , Junsheng Wang2 , Liyan Zhang1 1Nanjing University of Aeronautics and Astronautics 2Nanjing University of Science and Technology |
| Pseudocode | No | The paper describes the proposed method in detail in Section 3, but it does not include a formally labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not contain any explicit statements about making the source code available or providing a link to a code repository. |
| Open Datasets | Yes | CUHK-PEDES [Li et al., 2017b] comprises 40,206 pedestrian images along with 80,412 text descriptions corresponding to 13,003 distinct pedestrian identities. ... ICFG-PEDES [Ding et al., 2021] comprises 54,522 images with 4,102 distinct identities. |
| Dataset Splits | Yes | Challenging Data Partitions. We define three distinct settings to represent varying levels of difficulty. For the easy setting, we use 50% of the training set as the complete image-text pair data, 25% as missing image data, and 25% as missing text data, denoted as (50%, 25%, 25%). Similarly, we establish the medium setting, defined as (30%, 35%, 35%), and the hard setting as (10%, 45%, 45%) to elevate the training complexity. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions using 'the image encoder and text encoder components of the Clip [Radford et al., 2021] model' and 'Adam optimizer [Kingma and Ba, 2014]' and NLTK [Loper and Bird, 2002], but it does not specify version numbers for any of these software dependencies. |
| Experiment Setup | Yes | All images are resized to 384 × 128 pixels. For the text modality, the maximum length of text tokens is set to 80. The model is optimized via the Adam optimizer [Kingma and Ba, 2014] with a 0.0001 learning ratio. The batch size is set to 64, and the training process spans across a total of 60 epochs. The temperature parameter τ (Equations 19 and 24) is set to 0.02. |