reproducibilityindex.ai

Guiding Program Synthesis by Learning to Generate Examples

Authors: Larissa Laich, Pavol Bielik, Martin Vechev

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	5 EVALUATION We evaluate our approach by applying it to an existing Android layout synthesizer called Infer UI (Bielik et al., 2018) as described in Section 4. ... We show that even if we disable these two optimizations and instead guide the synthesizer purely by extending the input speciﬁcation with additional input-output examples, we can still achieve an accuracy increase from 35% to 71%. In all our experiments, we evaluate our models and Infer UI on a test subset of the DS+ dataset which contains 85 Google Play Store applications, each of which contains the ground truth of the absolute view positions on three different screen dimensions
Researcher Affiliation	Academia	Larissa Laich, Pavol Bielik, Martin Vechev Department of Computer Science ETH Zurich, Switzerland llaich@ethz.ch, {pavol.bielik, martin.vechev}@inf.ethz.ch
Pseudocode	No	The paper describes the approach using numbered steps in Section 3 but does not provide a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	We make our implementation and datasets available online at: https://github.com/eth-sri/guiding-synthesizers
Open Datasets	Yes	To train our models we obtained three datasets DU, DS and D+ S , each containing an increasing amount of information at the expense of being harder to collect. The unsupervised DU = {(xi, yi)}N i=1 is the simplest dataset and contains only positive input-output samples obtained by sampling 22, 000 unique screenshots (including the associated metadata of all the absolute view positions) of Google Play Store applications taken from the Rico dataset (Deka et al., 2017).
Dataset Splits	No	The paper describes datasets DU, DS, and DS+ used for training, and states evaluation is performed on a 'test subset of the DS+ dataset'. However, it does not provide specific details on how these training datasets are split into training and validation sets for model training, nor does it specify proportions or counts for such splits for reproduction purposes.
Hardware Specification	No	The paper does not provide specific hardware details such as exact GPU/CPU models, processor types, or memory amounts used for running the experiments. It only mentions using 'Android emulators' without specifying the underlying hardware.
Software Dependencies	No	The paper mentions the use of 'the state-of-the-art SMT solver Z3 (De Moura & Bjørner, 2008)' but does not provide a specific version number for Z3 or any other software dependencies like programming languages or machine learning frameworks with their versions.
Experiment Setup	Yes	In (CNN), the output is converted to an image ... with 3 convolutional layers with 5×5 ﬁlters of size 64, 32 and 16 and max pooling with kernel size 2 and stride 2. We regularize the network during training by positioning the outputs with a random offset... In (MLP), the output is transformed to a normalized feature vector and then fed into a feedforward neural network with 3 hidden layers of size 512 with Re LU activations... Concretely, we deﬁne a threshold hyperparameter t ∈ [0,1] which is used to return the ﬁrst candidate output y for which the probability of the output being correct is above this threshold.