Learning to Describe Scenes with Programs
Authors: Yunchao Liu, Zheng Wu, Daniel Ritchie, William T. Freeman, Joshua B. Tenenbaum, Jiajun Wu
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform several experiments on synthetic scene images, including quantitative comparison with baseline methods and further extensions and applications. We further demonstrate our model s ability to generalize to real images with a small amount of hand-labeled supervision which is only at the object level. |
| Researcher Affiliation | Collaboration | Yunchao Liu IIIS, Tsinghua University Zheng Wu MIT CSAIL, Shanghai Jiao Tong University Daniel Ritchie Brown University William T. Freeman MIT CSAIL, Google Research Joshua B. Tenenbaum MIT CSAIL Jiajun Wu MIT CSAIL |
| Pseudocode | Yes | Algorithm 1: Combining group prediction with program synthesis |
| Open Source Code | No | The paper does not contain any explicit statement about making its source code available or providing a link to a code repository. |
| Open Datasets | Yes | We create a synthetic dataset of images rendered from complex scenes with rich program structures. ... We train and test the models on two synthetic datasets, REGULAR and RANDOM, each containing 20,000 training and 500 test images... These images are generated by first sampling scenes and then rendering using the same renderer as in CLEVR (Johnson et al., 2017). |
| Dataset Splits | Yes | We create a dataset of 120 real images, where we use 90 for training, 10 for validation, and 20 for testing. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions several software components and models like 'Mask R-CNN', 'Res Net-34', 'Res Net-152', 'LSTM', and 'pix2pix', but it does not specify any version numbers for these software dependencies or the programming language used. |
| Experiment Setup | Yes | For synthetic data rendering, we use essentially the same settings as in CLEVR (Johnson et al., 2017). The objects are in two sizes (radius 0.4, 0.7), three shapes (sphere, cube, cylinder), two materials (metal, rubber), and eight colors (blue, brown, cyan, gray, green, purple, red, yellow). |