Self-Supervised Gait Encoding with Locality-Aware Attention for Person Re-Identification

Authors: Haocong Rao, Siqi Wang, Xiping Hu, Mingkui Tan, Huang Da, Jun Cheng, Bin Hu

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach typically improves existing skeleton-based methods by 10-20% Rank-1 accuracy, and it achieves comparable or even superior performance to multi-modal methods with extra RGB or depth information. We evaluate our method on three public Re ID datasets that provide 3D skeleton data: BIWI, IAS-Lab and Kinect Gait Biometry Dataset (KGBD). In Table 1, we conduct an extensive comparison with existing skeleton based person Re-ID methods. We perform ablation study to verify the effectiveness of each model component. As shown in Table 2, we draw the following conclusions: (a) The proposed encoder-decoder architecture (GE-GD) performs remarkably better (5.4% Rank-1 accuracy and 4.5% n AUC gain) than supervised learning paradigm that uses GE only, which verifies the necessity of encoder-decoder architecture and skeleton reconstruction mechanism.
Researcher Affiliation Academia Haocong Rao1,3,5 , Siqi Wang2 , Xiping Hu1,4,5 , Mingkui Tan3 , Huang Da2 , Jun Cheng1,5 and Bin Hu4 1Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences 2National University of Defense Technology 3South China University of Technology 4Lanzhou University 5The Chinese University of Hong Kong, Hong Kong
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes Authors contribute equally. Codes are available at https://github.com/Kali-Hac/SGE-LA.
Open Datasets Yes We evaluate our method on three public Re ID datasets that provide 3D skeleton data: BIWI [Munaro et al., 2014b], IAS-Lab [Munaro et al., 2014c] and Kinect Gait Biometry Dataset (KGBD) [Andersson and Araujo, 2015].
Dataset Splits Yes For BIWI, we use the full training set and the Walking testing set that contains dynamic skeleton data; For IAS-Lab, we use the full training set and two test splits, IAS-A and IASB; For KGBD, since no training and testing splits are given, we randomly leave one skeleton video of each person for testing and use the rest of videos for training. The experiments are repeated for multiple times and the average performance is reported on KGBD. We discard the first and last 10 frames of each original skeleton sequence to avoid ineffective skeleton recording. Then, we spilt the training dataset into multiple skeleton sequences with the length f, and two consecutive sequences share f 2 overlapping skeletons, which aims to obtain as many skeleton sequences as possible to train our model.
Hardware Specification No The paper does not specify any hardware details such as GPU models, CPU types, or memory.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes The sequence length f is empirically set to 6 as it achieves the best performance in average among different sequence length settings. To learn the locality-aware attention for the whole sequence, the attentional range D of LA is set to 6. We use a 2-layer LSTM with k = 256 hidden units per layer for both GE and GD. We empirically set both λR and λA to 1, while a momentum 0.9 is utilized for optimization. We use a learning rate lr = 0.0005, and we set the weight of L2 regularization β to 0.02. The batch size is set to 128 in all experiments.