reproducibilityindex.ai

Learning Human Pose Estimation Features with Convolutional Networks

Authors: Ajrun Jain; Jonathan Tompson; Mykhaylo Andriluka; Graham W. Taylor; Christoph Bregler

ICLR 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	This paper introduces a new architecture for human pose estimation using a multilayer convolutional network architecture and a modiﬁed learning technique that learns low-level features and a higher-level weak spatial model. Unconstrained human pose estimation is one of the hardest problems in computer vision, and our new architecture and learning schema shows improvement over the current stateof-the-art. The main contribution of this paper is showing, for the ﬁrst time, that a speciﬁc variation of deep learning is able to meet the performance, and in many cases outperform, existing traditional architectures on this task. We evaluated our architecture on the FLIC [38] dataset, which is comprised of 5003 still RGB images taken from an assortment of Hollywood movies.
Researcher Affiliation	Academia	Arjun Jain New York University ajain@nyu.edu Jonathan Tompson New York University tompson@cims.nyu.edu Mykhaylo Andriluka MPI Saarbruecken andriluk@mpi-inf.mpg.de Graham W. Taylor University of Guelph gwtaylor@uoguelph.ca Christoph Bregler New York University chris.bregler@nyu.edu
Pseudocode	No	The paper describes the convolutional network architecture and the spatial model using text and equations, but it does not provide any explicitly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper mentions using Theano for training, but it does not state that the authors' implementation code is open-source or provide a link to it.
Open Datasets	Yes	We evaluated our architecture on the FLIC [38] dataset, which is comprised of 5003 still RGB images taken from an assortment of Hollywood movies. We use 3987 training images from the dataset, which we also mirror horizontally to obtain a total of 3987 × 2 = 7974 examples.
Dataset Splits	No	From the training set images, we set aside a validation set to tune the network hyper-parameters, such as number and size of features, learning rate, momentum coefﬁcient, etc. The paper mentions setting aside a validation set but does not provide specific numbers or percentages for its size, which is needed for reproducibility.
Hardware Specification	Yes	Training each convnet on an NVIDIA TITAN GPU takes 1.9ms per patch (fprop + bprop) = 41min total. We test on a cpu cluster with 5000 nodes.
Software Dependencies	No	For training the convnet we use Theano [7], which provides a Python-based framework for efﬁcient GPU processing and symbolic differentiation of complex compound functions. The paper mentions 'Theano' but does not specify its version number or versions for any other key software components.
Experiment Setup	No	From the training set images, we set aside a validation set to tune the network hyper-parameters, such as number and size of features, learning rate, momentum coefﬁcient, etc. We used Nesterov momentum [43] as well as RMSPROP [46] to accelerate learning and we used L2 regularization and dropout [21] on the input to each of the fully-connected linear stages to reduce over-ﬁtting the restricted-size training set. The paper lists types of hyperparameters and regularization techniques but does not provide specific numerical values for them (e.g., learning rate value, dropout rate).