Learning to Infer and Execute 3D Shape Programs
Authors: Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our model accurately infers and executes 3D shape programs for highly complex shapes from various categories. It can also be integrated with an image-to-shape module to infer 3D shape programs directly from an RGB image, leading to 3D shape reconstructions that are both more accurate and more physically plausible. |
| Researcher Affiliation | Collaboration | Massachusetts Institute of Technology Princeton University Google Research {yonglong,aluo,ellisk,billf,jbt,jiajunwu}@mit.edu xs5@princeton.edu |
| Pseudocode | No | The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes the methods in text and with diagrams. |
| Open Source Code | No | Project page: http://shape2prog.csail.mit.edu |
| Open Datasets | Yes | Experiments on Shape Net (Chang et al., 2015), and Pix3D (Sun et al., 2018b). |
| Dataset Splits | Yes | The synthetic training set includes 100,000 chairs and 100,000 tables. The generator is evaluated on 5,000 chairs and tables. ...Our program executor is trained on 500,000 pairs of synthetic block programs and corresponding shapes, and tested on 30,000 pairs. ...For both tables and chairs, we randomly select 1,000 shapes for evaluation and all the remaining ones for guided adaptation. ...We split 80% shapes of each category for guided adaptation and the remaining for evaluation. |
| Hardware Specification | Yes | Our model takes 5 ms to infer a shape program with a Titan X GPU. |
| Software Dependencies | No | All components of our model are trained with Adam (Kingma & Ba, 2015). |
| Experiment Setup | Yes | Our model is first pretrained on the synthetic dataset and subsequently adapted to target dataset such as Shape Net and Pix3D under the guidance of the neural program executor. All components of our model are trained with Adam (Kingma & Ba, 2015). ...The generator is trained to predict the program token and regress the corresponding arguments via the following loss lgen = P b,i wplcls(pb,i, ˆpb,i)) + walreg(ab,i, ˆab,i)... During training, we minimize the sum of the weighted binary cross-entropy losses over all voxels via v V w1yv log ˆyv w0(1 yv) log(1 ˆyv), (1) |