Occupancy Planes for Single-View RGB-D Human Reconstruction
Authors: Xiaoming Zhao, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the challenging S3D data we observe a simple classifier based on the OPlanes representation to yield compelling results, especially in difficult situations with partial occlusions due to other objects and partial visibility, which haven t been addressed by prior work. We evaluate the proposed approach on the challenging S3D (Hu et al. 2021) data and observe improvements over prior reconstruction work (Saito et al. 2020; Chibane, Alldieck, and Pons-Moll 2020) by a margin, particularly for occluded or partially visible humans. We also provide a comprehensive analysis to validate each of the design choices and results on real-world data. |
| Researcher Affiliation | Academia | University of Illinois Urbana-Champaign |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly state that its own source code is available or provide a link to it. It only mentions that PIFu HD (a baseline) has no training code available but provides an official checkpoint. |
| Open Datasets | Yes | We utilize S3D (Hu et al. 2021) to train our OPlanesbased human reconstruction model. S3D is a photo-realistic synthetic dataset built on the game GTA-V, providing groundtruth meshes together with masks and depths. |
| Dataset Splits | Yes | To construct our train and test set, we sample 27588 and 4300 meshes from its train and validation split respectively. |
| Hardware Specification | Yes | It takes around 22 hours to complete the training using an AMD EPYC 7543 32-Core Processor and an Nvidia RTX A6000 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Adam (Kingma and Ba 2015) optimizer', 'group norm (Wu and He 2018)', 'ReLU activation', 'Res Net50 (He et al. 2016) as the backbone of our FPN network', but does not provide specific version numbers for these software dependencies or libraries. |
| Experiment Setup | Yes | During training, the input has a resolution of H = 512 and W = 512. We operate at HO = 256, WO = 256, while the intermediate resolution is h O = 128 and w O = 128. During training, for each mesh, we randomly sample N = 10 planes in the range of [zmin, zmax] at each training iteration. I.e., the set ZN contains 10 depth values. We use the Adam (Kingma and Ba 2015) optimizer with a learning rate of 0.001. We set λBCE = 1.0 and λDICE = 1.0 (Eq. (10) and Eq. (13)). We set the batch size to 4 and train for 15 epochs. |