Learning to Infer and Execute 3D Shape Programs

Authors: Yonglong Tian, Andrew Luo, Xingyuan Sun, Kevin Ellis, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our model accurately infers and executes 3D shape programs for highly complex shapes from various categories. It can also be integrated with an image-to-shape module to infer 3D shape programs directly from an RGB image, leading to 3D shape reconstructions that are both more accurate and more physically plausible.
Researcher Affiliation Collaboration Massachusetts Institute of Technology Princeton University Google Research {yonglong,aluo,ellisk,billf,jbt,jiajunwu}@mit.edu xs5@princeton.edu
Pseudocode No The paper does not contain explicitly labeled pseudocode or algorithm blocks. It describes the methods in text and with diagrams.
Open Source Code No Project page: http://shape2prog.csail.mit.edu
Open Datasets Yes Experiments on Shape Net (Chang et al., 2015), and Pix3D (Sun et al., 2018b).
Dataset Splits Yes The synthetic training set includes 100,000 chairs and 100,000 tables. The generator is evaluated on 5,000 chairs and tables. ...Our program executor is trained on 500,000 pairs of synthetic block programs and corresponding shapes, and tested on 30,000 pairs. ...For both tables and chairs, we randomly select 1,000 shapes for evaluation and all the remaining ones for guided adaptation. ...We split 80% shapes of each category for guided adaptation and the remaining for evaluation.
Hardware Specification Yes Our model takes 5 ms to infer a shape program with a Titan X GPU.
Software Dependencies No All components of our model are trained with Adam (Kingma & Ba, 2015).
Experiment Setup Yes Our model is first pretrained on the synthetic dataset and subsequently adapted to target dataset such as Shape Net and Pix3D under the guidance of the neural program executor. All components of our model are trained with Adam (Kingma & Ba, 2015). ...The generator is trained to predict the program token and regress the corresponding arguments via the following loss lgen = P b,i wplcls(pb,i, ˆpb,i)) + walreg(ab,i, ˆab,i)... During training, we minimize the sum of the weighted binary cross-entropy losses over all voxels via v V w1yv log ˆyv w0(1 yv) log(1 ˆyv), (1)