Causality-Inspired Invariant Representation Learning for Text-Based Person Retrieval

Authors: Yu Liu, Guihe Qin, Haipeng Chen, Zhiyong Cheng, Xun Yang

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on three datasets clearly demonstrate the advantages of IRLT over leading baselines in terms of accuracy and generalization.
Researcher Affiliation Academia Yu Liu1,2, Guihe Qin1,2, Haipeng Chen1,2*, Zhiyong Cheng3, Xun Yang4 1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Qilu University of Technology (Shandong Academy of Sciences), Ji Nan, China 4University of Science and Technology of China, He Fei, China
Pseudocode No The paper does not contain any explicit pseudocode blocks or algorithms.
Open Source Code No The paper does not provide any statement or link indicating that the source code for the methodology is openly available.
Open Datasets Yes Datasets. CUHK-PEDES(Li et al. 2017) is a pioneering dataset specifically designed for text-to-image person retrieval. ICFG-PEDES (Ding et al. 2021) consists of a total of 54,522 images representing 4,102 distinct identities. RSTPReid (Zhu et al. 2021) comprises 20,505 images depicting 4,101 unique identities captured by 15 cameras.
Dataset Splits Yes For all three datasets, we follow their official data splits for experiments and utilize the Rank-k metrics (with k values of 1, 5, and 10) as the principal evaluation metrics.
Hardware Specification Yes We conduct our experiments using a single RTX 3090 GPU with 24GB of memory.
Software Dependencies No The paper mentions optimizers (Adam) and backbone models (ResNet-50, BERT, CLIP) but does not provide specific version numbers for software libraries or frameworks (e.g., PyTorch, TensorFlow) used for implementation.
Experiment Setup Yes The input image is resized to 384 128. Adam (Kingma and Ba 2014) is used as the optimizer to train for 60 epochs and set the batchsize to 64. The hyper-parameters λ2 and λ3 are fixed as 1 and 0.1, respectively. The text length is set to 64 and 77, respectively. The dimension d of the representation is 1024 and 512, respectively. The initial learning rate is 5e-4 and 1e-5, respectively. The learning rate decay strategies are fixed-step decay (0.1 times decay every 10 epochs) and cosine learning rate decay, respectively. The hyperparameter λ1 is 0.5 and 0.1, respectively.