Occupancy Planes for Single-View RGB-D Human Reconstruction

Authors: Xiaoming Zhao, Yuan-Ting Hu, Zhongzheng Ren, Alexander G. Schwing

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On the challenging S3D data we observe a simple classifier based on the OPlanes representation to yield compelling results, especially in difficult situations with partial occlusions due to other objects and partial visibility, which haven t been addressed by prior work. We evaluate the proposed approach on the challenging S3D (Hu et al. 2021) data and observe improvements over prior reconstruction work (Saito et al. 2020; Chibane, Alldieck, and Pons-Moll 2020) by a margin, particularly for occluded or partially visible humans. We also provide a comprehensive analysis to validate each of the design choices and results on real-world data.
Researcher Affiliation Academia University of Illinois Urbana-Champaign
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not explicitly state that its own source code is available or provide a link to it. It only mentions that PIFu HD (a baseline) has no training code available but provides an official checkpoint.
Open Datasets Yes We utilize S3D (Hu et al. 2021) to train our OPlanesbased human reconstruction model. S3D is a photo-realistic synthetic dataset built on the game GTA-V, providing groundtruth meshes together with masks and depths.
Dataset Splits Yes To construct our train and test set, we sample 27588 and 4300 meshes from its train and validation split respectively.
Hardware Specification Yes It takes around 22 hours to complete the training using an AMD EPYC 7543 32-Core Processor and an Nvidia RTX A6000 GPU.
Software Dependencies No The paper mentions software components like 'Adam (Kingma and Ba 2015) optimizer', 'group norm (Wu and He 2018)', 'ReLU activation', 'Res Net50 (He et al. 2016) as the backbone of our FPN network', but does not provide specific version numbers for these software dependencies or libraries.
Experiment Setup Yes During training, the input has a resolution of H = 512 and W = 512. We operate at HO = 256, WO = 256, while the intermediate resolution is h O = 128 and w O = 128. During training, for each mesh, we randomly sample N = 10 planes in the range of [zmin, zmax] at each training iteration. I.e., the set ZN contains 10 depth values. We use the Adam (Kingma and Ba 2015) optimizer with a learning rate of 0.001. We set λBCE = 1.0 and λDICE = 1.0 (Eq. (10) and Eq. (13)). We set the batch size to 4 and train for 15 epochs.