Dense Keypoints via Multiview Supervision

Authors: Zhixuan Yu, Haozheng Yu, Long Sha, Sujoy Ganguly, Hyun Soo Park

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We perform experiments on human and monkey targets as two example applications to evaluate the effectiveness of our proposed semi-supervised learning pipeline.
Researcher Affiliation Collaboration Zhixuan Yu University of Minnesota yu000064@umn.edu Haozheng Yu University of Minnesota yu000424@umn.edu Long Sha Tu Simple long.sha@tusimple.ai Sujoy Ganguly Unity sujoy.ganguly@unity3d.com Hyun Soo Park University of Minnesota hspark@umn.edu
Pseudocode No The paper describes its methods but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code No The paper does not include an unambiguous statement about releasing source code for the described methodology, nor does it provide a direct link to a code repository.
Open Datasets Yes Human3.6M [11] is a large-scale indoor multiview dataset... For the human dense keypoints, we use 48K human instances in Dense Pose-COCO [8] training set to train the initial model... 3DPW [47] is an in-the-wild dataset... Ski-Pose PTZ-Camera Dataset [40] is a multiview dataset... Open Monkey Pose [3] is a large landmark dataset...
Dataset Splits Yes Human3.6M [11]... Following common protocols, we use subject S1, S5, S6, S7 and S8 for training, and reserve subject S9 and S11 for testing." and "Ski-Pose PTZ-Camera Dataset [40]... It contains 8.5K training images and 1.7K testing images. We use its standard train/test split to train and evaluation our model.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running its experiments. It only mentions the use of a neural network backbone and a deep learning framework.
Software Dependencies No The paper mentions software components like HRNet and PyTorch3D but does not specify their version numbers or any other software dependencies with version information required for replication.
Experiment Setup Yes We use HRNet [18] as the backbone network followed by four head networks made up of convolutional layers to predict foreground mask, body part index, and UV coordinates on the canonical body surface, respectively. Each network takes as an input a 224 224 image and outputs 15-channel (for foreground mask head only) or 25-channel 56 56 feature maps [8]. We train the network in two stages.