reproducibilityindex.ai

MoCap-guided Data Augmentation for 3D Pose Estimation in the Wild

Authors: Gregory Rogez, Cordelia Schmid

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We address 3D pose estimation in the wild. However, there does not exist a dataset of real-world images with 3D annotations. We thus evaluate our method in two different settings using existing datasets: (1) we validate our 3D pose predictions using Human3.6M [13]... (2) we evaluate on Leeds Sport dataset (LSP)[16]... Our method outperforms the state of the art in terms of 3D pose estimation in controlled environments (Human3.6M) and shows promising results for in-the-wild images (LSP).
Researcher Affiliation	Academia	Grégory Rogez Cordelia Schmid Inria Grenoble Rhône-Alpes, Laboratoire Jean Kuntzmann, France
Pseudocode	No	The paper describes the synthesis engine and CNN architecture but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We use the CMU Motion Capture Dataset2 and the Human3.6M 3D poses [13], and for 2D pose annotations the MPII-LSP-extended dataset [24] and the Human3.6M 2D poses and images.
Dataset Splits	Yes	We follow the protocol introduced in [18] and employed in [42]: we consider six subjects (S1, S5, S6, S7, S8 and S9) for training, use every 64th frame of subject S11 for testing and evaluate the 3D pose error (mm) averaged over the 13 joints.
Hardware Specification	No	We acknowledge the support of NVIDIA with the donation of the GPUs used for this research.
Software Dependencies	No	The paper mentions using Alex Net CNN architecture [19] and VGG-16 architecture [33] but does not provide specific software environment or library version numbers.
Experiment Setup	Yes	We empirically found that K=5000 clusters was a sufﬁcient number of clusters. Given a library of Mo Cap data and a set of camera views, we synthesize for each 3D pose a 220x220 image. We found that it performed better than just ﬁne-tuning a model pre-trained on Imagenet (3D error of 88.1mm vs 98.3mm with ﬁne-tuning).