Generative Neural Articulated Radiance Fields
Authors: Alexander Bergman, Petr Kellnhofer, Wang Yifan, Eric Chan, David Lindell, Gordon Wetzstein
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We propose a solution to these challenges by developing a 3D GAN framework that learns to generate radiance fields of human bodies or faces in a canonical pose and warp them using an explicit deformation field into a desired body pose or facial expression. Using our framework, we demonstrate the first high-quality radiance field generation results for human bodies. Moreover, we show that our deformation-aware training procedure significantly improves the quality of generated bodies or faces when editing their poses or facial expressions compared to a 3D GAN that is not trained with explicit deformations. We first evaluate the proposed deformation field by overfitting a single representation on a single dynamic full body scene. Then we apply this deformation method in a GAN training pipeline for both bodies (AIST++ [23] and SURREAL [120]) and faces (FFHQ) [104]. |
| Researcher Affiliation | Academia | Alexander W. Bergman Stanford University awb@stanford.edu Petr Kellnhofer TU Delft p.kellnhofer@tudelft.nl Wang Yifan Stanford University yifan.wang@stanford.edu Eric R. Chan Stanford University erchan@stanford.edu David B. Lindell University of Toronto Vector Institute lindell@cs.toronto.edu Gordon Wetzstein Stanford University gordonwz@stanford.edu |
| Pseudocode | No | The paper describes methods in text and uses diagrams, but does not include any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm', nor are there structured step-by-step procedures formatted like code. |
| Open Source Code | No | In the ethics checklist (3.a), the authors state: 'Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] We plan to release the code completely, but have not yet with submission.' |
| Open Datasets | Yes | AIST++ is a large dataset consisting of 10.1M images capturing 30 performers in dance motion. Each frame is annotated with a ground truth camera and fitted SMPL body model. SURREAL contains 6M images of synthetic humans created using SMPL body models in various poses rendered in indoor scenes. FFHQ is a large dataset of high-resolution images of human faces collected from Flickr. All images have licenses that allow free use, redistribution, and adaptation for non-commencial use. |
| Dataset Splits | Yes | We select a multi-view video sequence from the AIST++ dataset [23] and optimize tri-plane features in the canonical pose using a subset of the views and frames for supervision. We then evaluate the quality of the estimated radiance field warped into these training views and poses but also into held-out test views and poses. AIST++. AIST++ is a challenging dataset as the body poses are extremely diverse. We collect 30 frames per video as our training data after filtering out frames whose camera distance is above a threshold or the human bounding box is partially outside the image. Then we extract the human body by cropping a 600 × 600 patch centered at the pelvis joint, and resize these frames to 256 × 256. |
| Hardware Specification | Yes | The timings are measured to deform a single feature volume on an RTX3090 graphics processing unit. |
| Software Dependencies | No | The paper mentions software tools like 'Open3D library [122]', 'MMPose Project [125]', and 'DECA [128]' but does not specify their version numbers, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | Training details and hyper-parameters are discussed in the supplement. Rather than initializing our network weights randomly, we begin training from a pre-trained EG3D [1] model. Fine-tuning allows for quicker convergence and saves computational resources during training. Similarly to AIST++, we use transfer learning from a pre-trained EG3D model at the appropriate resolution. |