Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Kinematic-Structure-Preserved Representation for Unsupervised 3D Human Pose Estimation
Authors: Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, Venkatesh Babu Radhakrishnan, Anirban Chakraborty11312-11319
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments demonstrate our state-of-the-art unsupervised and weakly-supervised pose estimation performance on both Human3.6M and MPI-INF-3DHP datasets. Qualitative results on unseen environments further establish our superior generalization ability. |
| Researcher Affiliation | Academia | Jogendra Nath Kundu, Siddharth Seth, Rahul M V, Mugalodi Rakesh, R. Venkatesh Babu, Anirban Chakraborty Indian Institute of Science, Bangalore, India EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper provides architectural diagrams and mathematical formulations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not mention the release of source code or provide a link to a code repository. |
| Open Datasets | Yes | Datasets. The base-model is trained on a mixture of two datasets, i.e. Human3.6M and an in-house collection of You Tube videos (also referred as YTube). In contrast to the in-studio H3.6M dataset, YTube contains human subjects in diverse apparel and BG scenes performing varied forms of motion (usually dance forms such as western, modern, contemporary etc.). Note that all samples from H3.6M contribute to the paired dataset Dp, whereas 40% samples in YTube contributed to Dp and rest to Dunp based on the associated BG motion criteria. However, as we do not have ground-truth 3D pose for the samples from YTube (in-the-wild dataset), we use MPI-INF-3DHP (also referred as 3DHP) to quantitatively benchmark generalization of the proposed pose estimation framework. |
| Dataset Splits | No | The paper describes training on mixed datasets (YTube+H3.6M) and finetuning on H3.6M, and evaluates on standard test protocols. However, it does not provide specific percentages or sample counts for validation splits, nor does it explicitly mention a validation set with detailed split information. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments. |
| Software Dependencies | No | The paper mentions using Resnet-50 as a base pose encoder but does not list specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9, and CUDA 11.1'). |
| Experiment Setup | Yes | We use Resnet-50 (till res4f) with Image Net-pretrained parameters as the base pose encoder EP , whereas the appearance encoder is designed separately using 10 Convolutions. EP later divides into two parallel branches of fully-connected layers dedicated for vk and c respectively. We use J = 17 for all our experiments as shown in Fig. 1. The channel-wise aggregation of fam (16-channels) and fhm (17-channels) is passed through two convolutional layers to obtain f2D (128-maps), which is then concatenated with fa (512-maps) to form the input for DI (each with 14 14 spatial dimension). Our experiments use different Ada Grad optimizers (learning rate: 0.001) for each individual loss components in alternate training iterations, thereby avoiding any hyper-parameter tuning. |