Pseudo-Labeled Auto-Curriculum Learning for Semi-Supervised Keypoint Localization

Authors: Can Wang, Sheng Jin, Yingda Guan, Wentao Liu, Chen Qian, Ping Luo, Wanli Ouyang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on six keypoint localization benchmark datasets demonstrate that the proposed approach significantly outperforms the previous state-of-the-art SSL approaches.
Researcher Affiliation Collaboration Can Wang1 Sheng Jin2,1 Yingda Guan1 Wentao Liu1 Chen Qian1 Ping Luo2 Wanli Ouyang3 1Sense Time Research and Tetras.AI 2The University of Hong Kong 3The University of Sydney {wangcan, jinsheng}@tetras.ai {guanyingda, liuwentao, qianchen}@sensetime.com pluo@cs.hku.hk wanli.ouyang@sydney.edu.au
Pseudocode Yes Algorithm 1: Inner-loop network training; Algorithm 2: Pseudo-Labeled Auto-Curriculum Learning (PLACL)
Open Source Code No The paper does not explicitly state that its source code is being released or provide a direct link to a code repository for the described methodology.
Open Datasets Yes To show the versatility of PLACL, we conduct experiments on 5 diverse datasets. LSPET (Leeds Sports Pose Extended Dataset) (Johnson & Everingham, 2010; 2011)... MPII Human Pose dataset (Andriluka et al., 2014)... CUB-200-2011 (Caltech UCSD Birds-200-2011) (Welinder et al., 2010)... ATRW (Li et al., 2019c) dataset... MS-COCO 2017 (Lin et al., 2014)... Animal Pose (Cao et al., 2019) dataset...
Dataset Splits Yes We use 10,000 images from (Johnson & Everingham, 2011) for training and 2,000 images from (Johnson & Everingham, 2010) for validation and testing. ... We follow (Moskvyak et al., 2020) to use 10,000 random images from MPII train for training, 3,311 images from MPII train for validation and MPII val for evaluation. ... split dataset into training (100 categories with 5,864 images), validation (50 categories with 2,958 images) and testing (50 categories with 2,966 images). ... The dataset consists of 3,610 images for training, 516 for validation, and 1,033 for testing. ... We randomly select 500 images from COCO train for validation, the remaining training set (115k images) for training, and COCO val (5k images) for evaluation. ... It consists of 2,798 images for training, and the 810 images for validation and 1,000 images for testing.
Hardware Specification Yes Although the RL search process increases the training complexity, the total training cost is not too high (only 1.5 days with 32 NVIDIA Tesla V100 GPUs).
Software Dependencies No The paper mentions "Adam (Kingma & Ba, 2015)" as an optimizer and "PPO2 (Schulman et al., 2017)" for curriculum search, but does not specify software versions for programming languages, deep learning frameworks (e.g., PyTorch, TensorFlow), or other libraries used for implementation.
Experiment Setup Yes For the outer-loop, the PPO2 (Schulman et al., 2017) search procedure is conducted for T = 16 sampling steps, and in each step M = 8 sets of parameters (curriculum) are sampled. The clipping threshold is ϵ = 0.2, and µr t+1 is updated with the learning rate of α = 0.2. We empirically use R = 6 self-training rounds, and group size G = 10 for curriculum search. For the inner-loop, we follow the common practice (Sun et al., 2019; Contributors, 2020) to train the keypoint localization network with Mean-Squared Error (MSE) loss for N = 210 epochs per round. Adam (Kingma & Ba, 2015) with a learning rate of 0.001 is adopted. We reduce the learning rate by a factor of 10 at the 170-th and 200-th epochs.