Program Synthesis and Semantic Parsing with Learned Code Idioms
Authors: Eui Chul Shin, Miltiadis Allamanis, Marc Brockschmidt, Alex Polozov
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate PATOIS on two complex semantic parsing datasets and show that using learned code idioms improves the synthesizer s accuracy. We evaluate PATOIS on two challenging semantic parsing datasets: Hearthstone [24], a dataset of small domain-specific Python programs, and Spider [41], a large dataset of SQL queries over various databases. We find that equipping the synthesizer with learned idioms improves its accuracy in generating programs that satisfy the task description. Tables 2 and 3 show our ablation analysis of different configurations of PATOIS on the Hearthstone and Spider dev sets, respectively. Table 4 shows the test set results of the best model configuration for Hearthstone (the test instances for the Spider dataset are unreleased). |
| Researcher Affiliation | Collaboration | Richard Shin UC Berkeley ricshin@berkeley.edu Miltiadis Allamanis, Marc Brockschmidt & Oleksandr Polozov Microsoft Research {miallama,mabrocks,polozov}@microsoft.com |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. It uses diagrams and describes processes in text but no formal algorithm presentation. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. It links to third-party datasets or prior work's code, but not its own implementation. |
| Open Datasets | Yes | We evaluate PATOIS on two challenging semantic parsing datasets: Hearthstone [24] and Spider [41]. Hearthstone [24]: URL https://github. com/deepmind/card2code. Spider [41]: URL https://yale-lily.github.io/spider. |
| Dataset Splits | Yes | We mine the idioms using the training split of each dataset. Tables 2 and 3 show our ablation analysis of different configurations of PATOIS on the Hearthstone and Spider dev sets, respectively. Database schemas do not overlap between the train and test splits, thus challenging the model to generalize across different domains. |
| Hardware Specification | Yes | Each model configuration is trained on one NVIDIA GTX 1080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch [27]' as the implementation framework but does not specify a concrete version number for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | We run type-based MCMC (Section 3) for 10 iterations with α = 5 and d = 0.5. We ran ablation experiments with K {10, 20, 40, 80}. For the Hearthstone dataset... models are trained using the Adadelta optimizer [42] with learning rate 1.0, ρ = 0.95, ε = 10 6 for up to 2,600 steps with a batch size of 10. For the Spider dataset... models are trained using the Adam optimizer [20] with β1 = 0.9, β2 = 0.999, ε = 10 9 for up to 40,000 steps with a batch size of 10. The learning rate warms up linearly up to 2.5 10 4 during the first 2,000 steps, and then decays polynomially by (1 t/T) 0.5 where T is the total number of steps. |