Self-supervised Learning of Motion Capture

Authors: Hsiao-Yu Tung, Hsiao-Wei Tung, Ersin Yumer, Katerina Fragkiadaki

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide quantitative and qualitative results on 3D dense human shape tracking in SURREAL [35] and H3.6M [22] datasets. We compare our learning based model against two baselines: (1) Pretrained, a model that uses only supervised training from synthetic data, without self-supervised adaptation. (2) Direct optimization, a model that uses our differentiable self-supervised losses, but instead of optimizing neural network weights, optimizes directly over body mesh parameters (θ, β), rotation (R), translation (T), and focal length f.
Researcher Affiliation Collaboration Hsiao-Yu Fish Tung 1, Hsiao-Wei Tung 2, Ersin Yumer 3, Katerina Fragkiadaki 1 1 Carnegie Mellon University, Machine Learning Department 2 University of Pittsburgh, Department of Electrical and Computer Engineering 3 Adobe Research
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any statement indicating that the source code for the described methodology is publicly available, nor does it provide a link to a code repository.
Open Datasets Yes We test our method on two datasets: Surreal [35] and H3.6M [22]. Surreal is currently the largest synthetic dataset for people in motion. Human3.6M (H3.6M) is the largest real video dataset with annotated 3D human skeletons. It contains videos of actors performing activities and provides annotations of body joint locations in 2D and 3D at every frame, recorded through a Vicon system.
Dataset Splits Yes We split the dataset into train and test video sequences. Our model is first trained using supervised skeleton and surface parameters in the training set of the Surreal dataset. Then, it is self-supervised using differentiable rendering and re-projection error minimization at two test sets, one in the Surreal dataset, and one in H3.6M.
Hardware Specification No The paper does not explicitly describe the hardware used for running its experiments, such as specific GPU or CPU models.
Software Dependencies Yes The model is trained with gradient descent optimizer with learning rate 0.0001 and is implemented in Tensorflow v1.1.0 [1].
Experiment Setup Yes Our model architecture consists of 5 convolution blocks. Each block contains two convolutional layers with filter size 5 5 (stride 2) and 3 3 (stride 1), followed by batch normalization and leaky relu activation. The first block contains 64 channels, and we double size after each block. On top of these blocks, we add 3 fully connected layers and shrink the size of the final layer to match our desired outputs. Input image to our model is 128 128. The model is trained with gradient descent optimizer with learning rate 0.0001.