Guiding Program Synthesis by Learning to Generate Examples
Authors: Larissa Laich, Pavol Bielik, Martin Vechev
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 5 EVALUATION We evaluate our approach by applying it to an existing Android layout synthesizer called Infer UI (Bielik et al., 2018) as described in Section 4. ... We show that even if we disable these two optimizations and instead guide the synthesizer purely by extending the input specification with additional input-output examples, we can still achieve an accuracy increase from 35% to 71%. In all our experiments, we evaluate our models and Infer UI on a test subset of the DS+ dataset which contains 85 Google Play Store applications, each of which contains the ground truth of the absolute view positions on three different screen dimensions |
| Researcher Affiliation | Academia | Larissa Laich, Pavol Bielik, Martin Vechev Department of Computer Science ETH Zurich, Switzerland llaich@ethz.ch, {pavol.bielik, martin.vechev}@inf.ethz.ch |
| Pseudocode | No | The paper describes the approach using numbered steps in Section 3 but does not provide a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | We make our implementation and datasets available online at: https://github.com/eth-sri/guiding-synthesizers |
| Open Datasets | Yes | To train our models we obtained three datasets DU, DS and D+ S , each containing an increasing amount of information at the expense of being harder to collect. The unsupervised DU = {(xi, yi)}N i=1 is the simplest dataset and contains only positive input-output samples obtained by sampling 22, 000 unique screenshots (including the associated metadata of all the absolute view positions) of Google Play Store applications taken from the Rico dataset (Deka et al., 2017). |
| Dataset Splits | No | The paper describes datasets DU, DS, and DS+ used for training, and states evaluation is performed on a 'test subset of the DS+ dataset'. However, it does not provide specific details on how these training datasets are split into training and validation sets for model training, nor does it specify proportions or counts for such splits for reproduction purposes. |
| Hardware Specification | No | The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions using 'Android emulators' without specifying the underlying hardware. |
| Software Dependencies | No | The paper mentions the use of 'the state-of-the-art SMT solver Z3 (De Moura & Bjørner, 2008)' but does not provide a specific version number for Z3 or any other software dependencies like programming languages or machine learning frameworks with their versions. |
| Experiment Setup | Yes | In (CNN), the output is converted to an image ... with 3 convolutional layers with 5×5 filters of size 64, 32 and 16 and max pooling with kernel size 2 and stride 2. We regularize the network during training by positioning the outputs with a random offset... In (MLP), the output is transformed to a normalized feature vector and then fed into a feedforward neural network with 3 hidden layers of size 512 with Re LU activations... Concretely, we define a threshold hyperparameter t ∈ [0,1] which is used to return the first candidate output y for which the probability of the output being correct is above this threshold. |