Learning Human Pose Estimation Features with Convolutional Networks
Authors: Ajrun Jain; Jonathan Tompson; Mykhaylo Andriluka; Graham W. Taylor; Christoph Bregler
ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | This paper introduces a new architecture for human pose estimation using a multilayer convolutional network architecture and a modified learning technique that learns low-level features and a higher-level weak spatial model. Unconstrained human pose estimation is one of the hardest problems in computer vision, and our new architecture and learning schema shows improvement over the current stateof-the-art. The main contribution of this paper is showing, for the first time, that a specific variation of deep learning is able to meet the performance, and in many cases outperform, existing traditional architectures on this task. We evaluated our architecture on the FLIC [38] dataset, which is comprised of 5003 still RGB images taken from an assortment of Hollywood movies. |
| Researcher Affiliation | Academia | Arjun Jain New York University ajain@nyu.edu Jonathan Tompson New York University tompson@cims.nyu.edu Mykhaylo Andriluka MPI Saarbruecken andriluk@mpi-inf.mpg.de Graham W. Taylor University of Guelph gwtaylor@uoguelph.ca Christoph Bregler New York University chris.bregler@nyu.edu |
| Pseudocode | No | The paper describes the convolutional network architecture and the spatial model using text and equations, but it does not provide any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using Theano for training, but it does not state that the authors' implementation code is open-source or provide a link to it. |
| Open Datasets | Yes | We evaluated our architecture on the FLIC [38] dataset, which is comprised of 5003 still RGB images taken from an assortment of Hollywood movies. We use 3987 training images from the dataset, which we also mirror horizontally to obtain a total of 3987 × 2 = 7974 examples. |
| Dataset Splits | No | From the training set images, we set aside a validation set to tune the network hyper-parameters, such as number and size of features, learning rate, momentum coefficient, etc. The paper mentions setting aside a validation set but does not provide specific numbers or percentages for its size, which is needed for reproducibility. |
| Hardware Specification | Yes | Training each convnet on an NVIDIA TITAN GPU takes 1.9ms per patch (fprop + bprop) = 41min total. We test on a cpu cluster with 5000 nodes. |
| Software Dependencies | No | For training the convnet we use Theano [7], which provides a Python-based framework for efficient GPU processing and symbolic differentiation of complex compound functions. The paper mentions 'Theano' but does not specify its version number or versions for any other key software components. |
| Experiment Setup | No | From the training set images, we set aside a validation set to tune the network hyper-parameters, such as number and size of features, learning rate, momentum coefficient, etc. We used Nesterov momentum [43] as well as RMSPROP [46] to accelerate learning and we used L2 regularization and dropout [21] on the input to each of the fully-connected linear stages to reduce over-fitting the restricted-size training set. The paper lists types of hyperparameters and regularization techniques but does not provide specific numerical values for them (e.g., learning rate value, dropout rate). |