Learning to Describe Scenes with Programs

Authors: Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform several experiments on synthetic scene images, including quantitative comparison with baseline methods and further extensions and applications. We further demonstrate our model s ability to generalize to real images with a small amount of hand-labeled supervision which is only at the object level.
Researcher Affiliation Collaboration Yunchao Liu IIIS, Tsinghua University Zheng Wu MIT CSAIL, Shanghai Jiao Tong University Daniel Ritchie Brown University William T. Freeman MIT CSAIL, Google Research Joshua B. Tenenbaum MIT CSAIL Jiajun Wu MIT CSAIL
Pseudocode Yes Algorithm 1: Combining group prediction with program synthesis
Open Source Code No The paper does not contain any explicit statement about making its source code available or providing a link to a code repository.
Open Datasets Yes We create a synthetic dataset of images rendered from complex scenes with rich program structures. ... We train and test the models on two synthetic datasets, REGULAR and RANDOM, each containing 20,000 training and 500 test images... These images are generated by first sampling scenes and then rendering using the same renderer as in CLEVR (Johnson et al., 2017).
Dataset Splits Yes We create a dataset of 120 real images, where we use 90 for training, 10 for validation, and 20 for testing.
Hardware Specification No The paper does not provide any specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper mentions several software components and models like 'Mask R-CNN', 'Res Net-34', 'Res Net-152', 'LSTM', and 'pix2pix', but it does not specify any version numbers for these software dependencies or the programming language used.
Experiment Setup Yes For synthetic data rendering, we use essentially the same settings as in CLEVR (Johnson et al., 2017). The objects are in two sizes (radius 0.4, 0.7), three shapes (sphere, cube, cylinder), two materials (metal, rubber), and eight colors (blue, brown, cyan, gray, green, purple, red, yellow).