Multi-Modal Disordered Representation Learning Network for Description-Based Person Search
Authors: Fan Yang, Wei Li, Menglong Yang, Binbin Liang, Jianwei Zhang
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on two public datasets, and the results show that our method outperforms the state-of-the-art methods on CUHK-PEDES and ICFGPEDES datasets and achieves superior performance. |
| Researcher Affiliation | Academia | Fan Yang, Wei Li*, Menglong Yang, Binbin Liang, Jianwei Zhang Sichuan University, Chengdu, China |
| Pseudocode | No | The paper includes figures illustrating the architecture and mathematical formulations but no explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about open-sourcing code or provide links to a code repository. |
| Open Datasets | Yes | We evaluate the proposed method on CUHKPEDES (Li et al. 2017b) and ICFG-PEDES (Ding et al. 2021), two public description-based person Re-ID datasets. |
| Dataset Splits | Yes | We utilize the same data splitting method as (Li et al. 2017b) , which is split into 34,54 images with 11,003 identities for training, 3,078 images with 1,000 identities for validation, and 3,074 images with 1,000 pedestrians for testing. The ICFG-PEDES dataset... For training and testing, the dataset is split into 3,102 and 1,000 pedestrians. |
| Hardware Specification | No | The paper mentions training the network and running experiments but does not provide any specific details about the hardware used (e.g., GPU/CPU models, memory). |
| Software Dependencies | No | The paper mentions using BERT and Vision Transformer as models, and SGD for optimization, but does not specify any software versions for libraries, frameworks, or operating systems (e.g., PyTorch version, Python version, CUDA version). |
| Experiment Setup | Yes | We resize all input images to 384 128 and unify the length of input texts to 64. We train the whole network for 150 epochs and apply SGD to optimize model with the weight decay of 0.01 and a momentum of 0.9. The learning rate is set to 7 10 5. The learning rate is initialized by the warm-up trick in the first 10 epochs. |