Generating Person Images with Appearance-aware Pose Stylizer

Authors: Siyu Huang, Haoyi Xiong, Zhi-Qi Cheng, Qingzhong Wang, Xingran Zhou, Bihan Wen, Jun Huan, Dejing Dou

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two benchmark datasets show that our method is capable of generating visually appealing and realistic-looking results using arbitrary image and pose inputs.We have conducted extensive quantitative and qualitative experiments, and ablation studies to validate the effectiveness of the proposed methods.We conduct experiments on two benchmark person image generation datasets including Market-1501 [Zheng et al., 2015] and Deep Fashion (In-shop Clothes Retrieval Benchmark) [Liu et al., 2016].
Researcher Affiliation Collaboration Siyu Huang1 , Haoyi Xiong1 , Zhi-Qi Cheng2 , Qingzhong Wang3 , Xingran Zhou4 , Bihan Wen5 , Jun Huan6 and Dejing Dou1 1Baidu Research 2Carnegie Mellon University 3City University of Hong Kong 4Zhejiang University 5Nanyang Technological University 6Styling AI
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code Yes 1Code is available at https://github.com/siyuhuang/PoseStylizer
Open Datasets Yes We conduct experiments on two benchmark person image generation datasets including Market-1501 [Zheng et al., 2015] and Deep Fashion (In-shop Clothes Retrieval Benchmark) [Liu et al., 2016].
Dataset Splits No By following the settings in [Zhu et al., 2019], for Market-1501, we collect 263,632 training pairs and 12,000 testing pairs. For Deep Fashion, we collect 101,966 pairs for training and 8,570 pairs for testing. The paper specifies training and testing pairs, but does not explicitly mention a separate validation set or how it's used for reproducibility.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. It only mentions the PyTorch framework.
Software Dependencies No We implement our model on the Py Torch framework [Paszke et al., 2017]. The model is trained with an Adam optimizer [Kingma and Ba, 2014]. No specific version numbers are provided for PyTorch or other dependencies.
Experiment Setup Yes For both encoder and generator, we adopt a total block number L = 4 on Market-1501 dataset and L = 5 on Deep Fashion dataset, respectively. The first layer of encoder and the last layer of generator has 64 channels. The number of channels in every block is doubled/halved in encoder/generator until a maximum of 512. The size of feature map in every block is halved/doubled in encoder/generator using stride-2 convolutions/deconvolutions. The model is trained with an Adam optimizer [Kingma and Ba, 2014] for 800 epochs. The initial learning rate is 0.0002 and it linearly decays to 0 from 400 epochs to 800 epochs. Following [Zhu et al., 2019], the loss weights (α,λ1,λ2) are set as (5, 10, 10) on Market-1501 and (5, 1, 1) on Deep Fashion. In training Market-1501, we additionally apply Dropout [Hinton et al., 2012] with a rate of 0.5 after every generator block in case of overfitting.