Execution-Guided Neural Program Synthesis
Authors: Xinyun Chen, Chang Liu, Dawn Song
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our techniques on the Karel task (Bunel et al., 2018; Devlin et al., 2017a), the largest publicly available benchmark for input-output program synthesis, on which the most performant model in the past can achieve only an accuracy of around 77% (Bunel et al., 2018). We observe that our proposed techniques can gain better performance than the previous state-of-the-art results. In particular, by combining both of our techniques, we can achieve an accuracy of 92%, which is around 15 percentage points better than the state-of-the-art results. |
| Researcher Affiliation | Collaboration | Xinyun Chen UC Berkeley Chang Liu Citadel Securities Dawn Song UC Berkeley |
| Pseudocode | Yes | Algorithm 1 Execution-guided synthesis (sequential case) 1: function EXEC(Γ, {(sk i , sk o)}K k=1, ) 2: // The main algorithm is called using Exec (Γ, {IOK}, ) 3: P 4: S Γ({(sk i , sk o)}K k=1) 5: while S = do 6: if S = if-token then // If-statement synthesis 7: S, {(sk i , sk o)}K k=1 Exec If(Γ, {(sk i , sk o)}K k=1) 8: else 9: if S = while-token then // While-statement synthesis 10: S, {(sk i , sk o)}K k=1 Exec While(Γ, {(sk i , sk o)}K k=1) 11: else // Execution of S 12: S, sk i , sk new for k = 1, ..., K 13: sk i sk new for k = 1, ..., K 14: end if 15: end if 16: P P; S 17: S Γ({(sk i , sk o)}K k=1) 18: end while 19: return P 20: end function |
| Open Source Code | No | The paper does not provide an explicit statement about open-sourcing its code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our techniques on the Karel task (Pattis, 1981; Bunel et al., 2018). We train and evaluate our approaches on their dataset, which is built by randomly sampling programs from the DSL. |
| Dataset Splits | Yes | In total, there are 1,116,854 programs for training, 2,500 in the validation set, and 2,500 in the test set. |
| Hardware Specification | No | The paper mentions neural network architectures like CNNs and LSTMs, but does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using the 'Adam optimizer' and 'REINFORCE', citing their respective papers (Kingma & Ba, 2015) and (Williams, 1992), but does not provide specific version numbers for any software libraries, frameworks, or compilers used (e.g., TensorFlow 2.x, PyTorch 1.x, Python 3.x). |
| Experiment Setup | Yes | The learning rate of supervised training is 10 4, and the learning rate of reinforcement learning is 10 5. We set the batch size to be 128 for supervised training, and 16 for RL training. |