reproducibilityindex.ai

Execution-Guided Neural Program Synthesis

Authors: Xinyun Chen, Chang Liu, Dawn Song

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our techniques on the Karel task (Bunel et al., 2018; Devlin et al., 2017a), the largest publicly available benchmark for input-output program synthesis, on which the most performant model in the past can achieve only an accuracy of around 77% (Bunel et al., 2018). We observe that our proposed techniques can gain better performance than the previous state-of-the-art results. In particular, by combining both of our techniques, we can achieve an accuracy of 92%, which is around 15 percentage points better than the state-of-the-art results.
Researcher Affiliation	Collaboration	Xinyun Chen UC Berkeley Chang Liu Citadel Securities Dawn Song UC Berkeley
Pseudocode	Yes	Algorithm 1 Execution-guided synthesis (sequential case) 1: function EXEC(Γ, {(sk i , sk o)}K k=1, ) 2: // The main algorithm is called using Exec (Γ, {IOK}, ) 3: P 4: S Γ({(sk i , sk o)}K k=1) 5: while S = do 6: if S = if-token then // If-statement synthesis 7: S, {(sk i , sk o)}K k=1 Exec If(Γ, {(sk i , sk o)}K k=1) 8: else 9: if S = while-token then // While-statement synthesis 10: S, {(sk i , sk o)}K k=1 Exec While(Γ, {(sk i , sk o)}K k=1) 11: else // Execution of S 12: S, sk i , sk new for k = 1, ..., K 13: sk i sk new for k = 1, ..., K 14: end if 15: end if 16: P P; S 17: S Γ({(sk i , sk o)}K k=1) 18: end while 19: return P 20: end function
Open Source Code	No	The paper does not provide an explicit statement about open-sourcing its code or a link to a code repository.
Open Datasets	Yes	We evaluate our techniques on the Karel task (Pattis, 1981; Bunel et al., 2018). We train and evaluate our approaches on their dataset, which is built by randomly sampling programs from the DSL.
Dataset Splits	Yes	In total, there are 1,116,854 programs for training, 2,500 in the validation set, and 2,500 in the test set.
Hardware Specification	No	The paper mentions neural network architectures like CNNs and LSTMs, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments.
Software Dependencies	No	The paper mentions using the 'Adam optimizer' and 'REINFORCE', citing their respective papers (Kingma & Ba, 2015) and (Williams, 1992), but does not provide specific version numbers for any software libraries, frameworks, or compilers used (e.g., TensorFlow 2.x, PyTorch 1.x, Python 3.x).
Experiment Setup	Yes	The learning rate of supervised training is 10 4, and the learning rate of reinforcement learning is 10 5. We set the batch size to be 128 for supervised training, and 16 for RL training.