Associative Embedding: End-to-End Learning for Joint Detection and Grouping

Authors: Alejandro Newell, Zhiao Huang, Jia Deng

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show how to apply this method to multi-person pose estimation and report state-of-the-art performance on the MPII and MS-COCO datasets.
Researcher Affiliation Academia Alejandro Newell Computer Science and Engineering University of Michigan Ann Arbor, MI alnewell@umich.edu Zhiao Huang* Institute for Interdisciplinary Information Sciences Tsinghua University Beijing, China hza14@mails.tsinghua.edu.cn Jia Deng Computer Science and Engineering University of Michigan Ann Arbor, MI jiadeng@umich.edu
Pseudocode No The paper describes the method in prose and uses diagrams but does not include any explicit pseudocode or algorithm blocks.
Open Source Code No The paper mentions using TensorFlow but does not state that its own source code is open or provide a link to its implementation.
Open Datasets Yes We evaluate on two datasets: MS-COCO [27] and MPII Human Pose [3]. MPII Human Pose consists of about 25k images and contains around 40k total annotated people (three-quarters of which are available for training). MS-COCO [27] consists of around 60K training images with more than 100K people with annotated keypoints.
Dataset Splits Yes MPII Human Pose consists of about 25k images and contains around 40k total annotated people (three-quarters of which are available for training). MS-COCO [27] consists of around 60K training images with more than 100K people with annotated keypoints. We report performance on two test sets, a development test set (test-dev) and a standard test set (test-std).
Hardware Specification No The paper mentions using TensorFlow but does not specify any CPU, GPU models, or other hardware specifications used for running the experiments.
Software Dependencies No We train the network using... Tensorflow [2]. No specific version number is provided for TensorFlow or other software dependencies.
Experiment Setup Yes The network used for this task consists of four stacked hourglass modules, with an input size of 512 512 and an output resolution of 128 128. We train the network using a batch size of 32 with a learning rate of 2e-4 (dropped to 1e-5 after about 150k iterations) using Tensorflow [2]. The associative embedding loss is weighted by a factor of 1e-3 relative to the MSE loss of the detection heatmaps.