SPoC: Search-based Pseudocode to Code
Authors: Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy S. Liang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | For evaluation, we collected the SPo C dataset (Search-based Pseudocode to Code) containing 18,356 C++ programs with humanauthored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25.6% to 44.7%. |
| Researcher Affiliation | Academia | Department of Computer Science Stanford University {sumith,ppasupat,kach,minalee,padon,aaiken,pliang}@cs.stanford.edu |
| Pseudocode | Yes | Figure 1: Given L pseudocode lines x1:L (with indentation levels ℓ1:L) and public test cases, our task is to synthesize a program with code lines y1:L. |
| Open Source Code | Yes | The dataset and code are available at https://sumith1896.github.io/ spoc/. |
| Open Datasets | Yes | For evaluation, we collected and release a new dataset, SPo C (Search-based Pseudocode to Code) containing 18,356 C++ programs with humanauthored pseudocode and test cases. The dataset and code are available at https://sumith1896.github.io/ spoc/. |
| Dataset Splits | Yes | To evaluate the generalization on unseen problems and annotation styles, we created two test sets. We generated the first test set TESTP by splitting based on problems: we held out 158 problems (23% out of 677 problems), which is equivalent to 1,820 programs (10.1% of all programs). The second test set TESTW is split by workers: we held out 7 workers (12% out of 59 workers), which is equivalent to 1,752 programs (9.7% of all programs, with 186 programs overlapping with TESTP). We used the remaining data for training and development (90:10 split). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU or GPU models, memory, or cloud instance types) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Open NMT' for the translation model but does not specify any version numbers for Open NMT or other software libraries/dependencies. |
| Experiment Setup | Yes | (We use M = 100 for our experiments.) The model also assigns a probability pij = p(cij | xi) for each candidate cij. We use βmul = 0.95 for the experiments. Given the candidate lists C1, . . . , CL, we can synthesize a program ˆy by picking a candidate ci,j[i] from each Ci (where j[i] {1, . . . , M}) and then concatenating them into a program. In our search algorithm, we iterate through programs ˆy in the descending order of probability p(ˆy) = QL i=1 pi,j[i]. |