reproducibilityindex.ai

Neural-Guided Deductive Search for Real-Time Program Synthesis from Examples

Authors: Ashwin Kalyan, Abhishek Mohta, Oleksandr Polozov, Dhruv Batra, Prateek Jain, Sumit Gulwani

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate our NGDS algorithm over the string manipulation domain with a DSL given by Figure 2; see Figure 1 for an example task. We evaluate NGDS, its ablations, and baseline techniques on two key metrics: (a) generalization accuracy on unseen inputs, (b) synthesis time. Dataset. We use a dataset of 375 tasks collected from real-world customer string manipulation problems, split into 65% training, 15% validation, and 20% test data.
Researcher Affiliation	Collaboration	Ashwin K. Vijayakumar & Dhruv Batra School of Interactive Computing Georgia Tech Atlanta, GA 30308, USA {ashwinkv,dbatra}@gatech.edu Abhishek Mohta & Prateek Jain Microsoft Research India Bengaluru, Karnataka 560001, India {t-abmoht,prajain}@microsoft.com Oleksandr Polozov & Sumit Gulwani Microsoft Research Redmond Redmond, WA 98052, USA {polozov,sumitg}@microsoft.com
Pseudocode	Yes	Figure 5: The controllers for guiding the search process to construct a most generalizable ϕ-satisfying program set S of size k given the f-predicted best scores s1, ..., sn of the productions F1, ... , Fn. Figure 6: Neural-guided deductive search over L, parameterized with a branch selection controller C.
Open Source Code	No	The paper does not explicitly state that the source code for their methodology is made publicly available or provide a link to a code repository.
Open Datasets	No	We use a dataset of 375 tasks collected from real-world customer string manipulation problems, split into 65% training, 15% validation, and 20% test data. Some of the common applications found in our dataset include date/time formatting, manipulating addresses, modifying names, automatically generating email IDs, etc. Each task contains about 10 inputs, of which only one is provided as the spec to the synthesis system, mimicking industrial applications. The remaining unseen examples are used to evaluate generalization performance of the synthesized programs. After running synthesis of top-1 programs with PROSE on all training tasks, we have collected a dataset of 400,000 intermediate search decisions, i.e. triples production Γ, spec ϕ, a posteriori best score h(P, ϕ) .
Dataset Splits	Yes	We use a dataset of 375 tasks collected from real-world customer string manipulation problems, split into 65% training, 15% validation, and 20% test data.
Hardware Specification	Yes	We run all the methods on the same machine with 2.3 GHz Intel Xeon processor, 64GB of RAM, and Windows Server 2016.
Software Dependencies	No	Finally, we train all our LSTM-based models with CNTK (Seide & Agarwal, 2016) using Adam (Kingma & Ba, 2014) with a learning rate of 10 2 and a batch size of 32, using early stopping on the validation loss to select the best performing model (thus, 100-600 epochs).
Experiment Setup	Yes	Finally, we train all our LSTM-based models with CNTK (Seide & Agarwal, 2016) using Adam (Kingma & Ba, 2014) with a learning rate of 10 2 and a batch size of 32, using early stopping on the validation loss to select the best performing model (thus, 100-600 epochs).