Pedestrian Attribute Recognition by Joint Visual-semantic Reasoning and Knowledge Distillation

Authors: Qiaozhe Li, Xin Zhao, Ran He, Kaiqi Huang

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The proposed framework is verified on three large scale pedestrian attribute datasets including PETA, RAP, and PA100k. Experiments show that our method achieves state-of-the-art results.
Researcher Affiliation Academia Qiaozhe Li1,3 , Xin Zhao1,3 , Ran He2,3 and Kaiqi Huang1,3 1CRISE, CASIA 2CRIPAC & NLPR, CASIA 3University of Chinese Academy of Sciences liqiaozhe2015@ia.ac.cn, {xzhao,rhe,kqhuang}@nlpr.ia.ac.cn
Pseudocode No The paper describes its methodology using text and mathematical equations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the described methodology, nor does it include links to a code repository.
Open Datasets Yes The proposed framework is verified on three large scale pedestrian attribute datasets including PETA, RAP, and PA100k. The PEdes Trian Attribute (PETA) dataset [Deng et al., 2014]... The Richly Annotated Pedestrian (RAP) attribute dataset [Li et al., 2016a]... The PA100k Dataset [Liu et al., 2017]... Densepose [Alp Guler et al., 2018] dataset for training.
Dataset Splits Yes The PEdes Trian Attribute (PETA) dataset [Deng et al., 2014] consists of 19, 000 person images collected from 10 small-scale person datasets. The whole dataset is randomly divided into three non-overlapping partitions: 9500 for training, 1900 for verification, and 7600 for evaluation. The PA100k Dataset [Liu et al., 2017] consists of 100,000 pedestrian images from 598 outdoor scenes. Each image is described with 26 commonly used attributes. The whole dataset is split into training, validation and test sets with a ratio of 8:1:1.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models used for running the experiments. It only mentions using a 'Res Net-50 network'.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, or TensorFlow versions).
Experiment Setup Yes The parsing net takes images of size 512 512 as inputs and outputs prediction maps of size 30 30. The network is trained for 20 epochs with a batch size of 8. For data augmentation, the input images are randomly scaled from 384 192 to 256 128 for each mini batch. Dv is 2048 and Ds is set to 512. The temperature T is set to 3. The network is optimized by stochastic gradient descend algorithm with a batch size of 16, a momentum of 0.9 and a weight decay of 0.0005. The initial learning rate is set to 0.001 and is divided by 10 after every 30 epochs. The reasoning network is trained for 60 epochs.