Proactive Multi-Camera Collaboration for 3D Human Pose Estimation
Authors: Hai Ci, Mickel Liu, Xuehai Pan, fangwei zhong, Yizhou Wang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our proposed method in four photo-realistic UE4 environments to ensure validity and generalizability. Empirical results show that our method outperforms fixed and active baselines in various scenarios with different numbers of cameras and humans. |
| Researcher Affiliation | Academia | School of Computer Science, Peking University Nat l Key Lab. of GAI & Beijing Institute for GAI (BIGAI) School of Intelligence Science and Technology, Peking University Inst. for AI, Peking University {cihai, Xuehai Pan, zfw, yizhou.wang}@pku.edu.cn, mickelliu@stu.pku.edu.cn |
| Pseudocode | Yes | A TRAINING ALGORITHM PSUEDOCODE Algorithm 1 Learning Multi-Camera Collaboration (CTCR + WDL) |
| Open Source Code | Yes | To help facilitate more fruitful research on this topic, we release our environments with Open AI Gym-API (Brockman et al., 2016) integration and together with a dedicated visualization tool. B.2 LICENSE All assets used in the environment are commercially-available and obtained from the UE4 Marketplace. The environment and tools developed in this work are licensed under Apache License 2.0. |
| Open Datasets | Yes | For the evaluations of the learned policies, we build photo-realistic environments (Unreal Pose) using Unreal Engine 4 (UE4) and Unreal CV (Qiu et al., 2017). These environments can simulate realistic-behaving crowds with assurances of high fidelity and customizability. We train the agents on a Blank environment and validate their policies on three unseen scenarios with different landscapes, levels of illumination, human appearances, and various quantities of cameras and humans. To help facilitate more fruitful research on this topic, we release our environments with Open AI Gym-API (Brockman et al., 2016) integration and together with a dedicated visualization tool. |
| Dataset Splits | Yes | We train the agents on a Blank environment and validate their policies on three unseen scenarios with different landscapes, levels of illumination, human appearances, and various quantities of cameras and humans. |
| Hardware Specification | Yes | We run each experiment with 8 NVIDIA RTX 2080 Ti GPUs. The average time data are generated on a computing node with one NVIDIA RTX 2080TI GPU and one Intel Xeon E5-2699A CPU. |
| Software Dependencies | No | We built four virtual environments for simulating active HPE in the wild using Unreal Engine 4 (UE4), which is a powerful 3D game engine that can provide real-time and photo-realistic renderings for making visually-stunning video games. The agent performs 2D human detection and pose estimation on the observed RGB image with the YOLOv3 (Redmon & Farhadi, 2018) detector and the HRNet-w32 (Sun et al., 2019) pose estimator, respectively. We use the Unreal CV (Qiu et al., 2017) plugin as the medium to acquire images and annotations from the environment. |
| Experiment Setup | Yes | D.1 TRAINING DETAILS All control policies are trained in the Black Env scene. At the testing stage, we apply zeroshot transfer for the learned policies to three realistic scenes: School Gym, Urban Street, and Wilderness. To simulate a dynamic human crowd that performs high random behaviors, we sample arbitrary goals for each human and employ the built-in navigation system to generate collision-free trajectories. Each human walks at a random speed. To ensure the generalization across a different number of humans, we train our RL policy with a mixture of environments of 1 to 6 humans. The learning rate is set to 5e-4 with scheduled decay during the training phase. The annealing schedule for the learning rate is detailed in Table. 2. The maximum episode length is 500 steps, discounted factor \gamma is 0.99 and the GAE horizon is 25 steps. Each sampling iteration produces a training batch with the size of 700 steps, then we perform 16 iterations on every training batch with 2 SGD mini-batch updates for each iteration (i.e. SGD batch size = 350). Table 2 shows the common training hyper-parameters shared between the baseline models (MAPPO) and all of our methods. |