Cross-Domain Human Parsing via Adversarial Feature and Label Adaptation
Authors: Si Liu, Yao Sun, Defa Zhu, Guanghui Ren, Yu Chen, Jiashi Feng, Jizhong Han
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted where LIP dataset is the source domain and 4 different datasets including surveillance videos, movies and runway shows without any annotations, are evaluated as target domains. The results consistently confirm data efficiency and performance advantages of the proposed method for the challenging cross-domain human parsing problem. We conduct extensive experiments to evaluate performance of our model for 4 cross-domain human parsing scenarios. |
| Researcher Affiliation | Collaboration | Si Liu,1,4,5 Yao Sun,1 Defa Zhu,1 Guanghui Ren,1 Yu Chen,2 Jiashi Feng,3 Jizhong Han1 1 Institute of Information Engineering, Chinese Academy of Sciences 2 JD.com 3 Department of ECE, National University of Singapore 4 Jiangsu Key Laboratory of Big Data Analysis Technology /B-DAT, Nanjing University of Information Science & Technology 5 Collaborative Innovation Center of Atmospheric Environment and Equipment Technology Nanjing University of Information Science and Technology, Nanjing, China |
| Pseudocode | Yes | Algorithm 1: Training details of the integrated cross-domain human parsing framework. Input: Source domain images Sx; source domain labels Sy; target domain images Tx; feature extractor E; feature compensation network C; feature adversarial network Af; structured label adversarial network Al; pixel-wise labeler L; number of training iterations N; a constant KC. |
| Open Source Code | No | The paper states: "Thirdly, we will release the source code of our implementation to the academic to facilitate future studies." This indicates a future intention to release the code, not that it is currently available or provided with concrete access details. |
| Open Datasets | Yes | Source Domain : We use LIP dataset (Gong et al. 2017) as the source domain that contains more than 50, 000 images with careful pixel-wise annotations of 19 semantic human parts. Target Domain: The following four target domains are investigated in this paper. Indoor dataset (Liu et al. 2016) contains 1, 900 labeled images with 12 semantic human part labels and 15, 436 unlabeled images. Daily Video dataset is a newly collected dataset, containing 1, 584 labeled images with 12 semantic human part labels and 19, 964 unlabeled images. Prid A and Prid B datasets are selected from camera view A and camera view B of Person Re-ID 2011 Dataset (Roth et al. 2014). |
| Dataset Splits | No | The paper describes the datasets used (LIP, Indoor, Daily Video, Prid A, Prid B) and mentions that "All these scores are obtained on the testing sets of the target domains." It also discusses the target domains having "unlabeled images" for training the cross-domain model. However, it does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or counts) for any of these datasets to allow for reproduction of data partitioning. |
| Hardware Specification | Yes | The experiments are done on a single NVIDIA Ge Force GTX TITAN X GPU with 12GB memory. |
| Software Dependencies | No | The paper states, "The whole framework is trained on Py Torch" and "The feature extractor and the pixel-wise labeler use the Deep Lab model". While PyTorch and Deep Lab are mentioned as software components, no specific version numbers are provided for either, which is necessary for reproducibility. |
| Experiment Setup | Yes | Implementation details: The whole framework is trained on Py Torch with a mini-batch size of 10. The input image size is 241 121. The constant KC is 5 in our experiment. During training of the feature adversarial adaption component, Adam optimizer is used with β1 = 0.5 and β2 = 0.999. The learning rate is 1e-5. When training the structured label adaptation component, we use Adam optimizer with β1 = 0.5, and β2 = 0.999, while the learning rate is 1e8. The remaining networks are optimized via SGD optimizer with momentum of 0.9, learning rate 1e-8 and weight decay of 0.0005. |