Deep Reinforcement Learning for Active Human Pose Estimation
Authors: Erik Gärtner, Aleksis Pirinen, Cristian Sminchisescu10835-10844
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our model using single- and multi-target estimators with strong result in both settings. Our system further learns automatic stopping conditions in time and transition functions to the next temporal processing step in videos. In extensive experiments with the Panoptic multi-view setup, and for complex scenes containing multiple people, we show that our model learns to select viewpoints that yield significantly more accurate pose estimates compared to strong multi-view baselines. |
| Researcher Affiliation | Collaboration | Erik G artner,1 Aleksis Pirinen,1 Cristian Sminchisescu1,2,3 1Department of Mathematics, Faculty of Engineering, Lund University 2Institute of Mathematics of the Romanian Academy 3Google Research {erik.gartner, aleksis.pirinen, cristian.sminchisescu}@math.lth.se |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not provide a concrete statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We use 5 active-sequences, each consisting of length 10, to approximate the policy gradient, and update the policy parameters using Adam (Kingma and Ba 2015). As standard, to reduce variance we normalize cumulative rewards for each episode to zero mean and unit variance over the batch. The maximum trajectory length is set to 8 views including the initial one (10 in the multi-target mode, as it may require more views to reconstruct all people). The viewpoint selection and continue actions are trained jointly for 80k episodes. The learning rate is initially set to 5e-7 and is halved at 720k and 1440k agent steps. We linearly increase the precision parameters ma and me of the von Mises distributions from (1, 10) to (25, 50) in training, making the viewpoint selection increasingly focused on high-rewarding regions as training proceeds. |
| Dataset Splits | Yes | The scenes are randomly split into training, validation and test sets with 10, 4 and 6 scenes, respectively. |
| Hardware Specification | No | The paper mentions runtimes for DMHS-based systems but does not specify any particular hardware (GPU/CPU models, etc.) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Faster R-CNN, DMHS, Muby Net, and Adam optimizer but does not specify version numbers for these software components or any other libraries. |
| Experiment Setup | Yes | We use 5 active-sequences, each consisting of length 10, to approximate the policy gradient, and update the policy parameters using Adam (Kingma and Ba 2015). As standard, to reduce variance we normalize cumulative rewards for each episode to zero mean and unit variance over the batch. The maximum trajectory length is set to 8 views including the initial one (10 in the multi-target mode, as it may require more views to reconstruct all people). The viewpoint selection and continue actions are trained jointly for 80k episodes. The learning rate is initially set to 5e-7 and is halved at 720k and 1440k agent steps. We linearly increase the precision parameters ma and me of the von Mises distributions from (1, 10) to (25, 50) in training, making the viewpoint selection increasingly focused on high-rewarding regions as training proceeds. We use median averaging for fusing poses, cf. (2). ... The improvement threshold τ, which is 0.07 for DMHS and 0.04 for Muby Net. |