FOF: Learning Fourier Occupancy Field for Monocular Real-time Human Reconstruction

Authors: Qiao Feng, Yebin Liu, Yu-Kun Lai, Jingyu Yang, Kun Li

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on both public dataset and real captured data show that our approach can reconstruct human meshes accurately and robustly in real-time.We use Chamfer distance, P2S (point-to-surface) distance, and normal image error for evaluation.Table 2: Comparison with the state-of-the-art methods.
Researcher Affiliation Academia Qiao Feng Tianjin University fengqiao@tju.edu.cn Yebin Liu Tsinghua University liuyebin@mail.tsinghua.edu.cn Yu-Kun Lai Cardiff University laiy4@cardiff.ac.uk Jingyu Yang Tianjin University yjy@tju.edu.cn Kun Li Tianjin University lik@tju.edu.cn
Pseudocode No The paper describes procedures in text, such as in Section 3.3 'FOF to mesh' and 'Mesh to FOF', but does not provide structured pseudocode or algorithm blocks with labels.
Open Source Code Yes The code is available for research purposes at http://cic.tju.edu.cn/faculty/likun/projects/FOF.
Open Datasets Yes We collect 2038 high-quality human scans from Twindom 1 and THuman2.0 [28] to train and evaluate our method. 1https://web.twindom.com/
Dataset Splits Yes We randomly select 1059 from Twindom and 368 from THuman2.0 as the training set, and 302 from Twindom and 105 from THuman2.0 as the test set. The remaining subjects are used as the validation set.
Hardware Specification Yes Our three stages are all implemented with Py Torch and running on a single RTX-3090 GPU.
Software Dependencies No The paper mentions 'implemented with Py Torch', 'Open CV [4]', 'RVM [16]', and 'Py Torch3D [21]' but does not provide specific version numbers for these software components to ensure reproducibility.
Experiment Setup Yes In our implementation, N is chosen as 15, which is accurate enough for most 3D human geometries.the FOF is resized to a proper resolution (256 256 in our implementation)We use the L1 loss is to train our FOF baseline and variants. To make the network more focused on the human geometry, we only supervise the human foreground region of the image.We use 512 512 512 resolution for all these methods.