reproducibilityindex.ai

SPoC: Search-based Pseudocode to Code

Authors: Sumith Kulal, Panupong Pasupat, Kartik Chandra, Mina Lee, Oded Padon, Alex Aiken, Percy S. Liang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	For evaluation, we collected the SPo C dataset (Search-based Pseudocode to Code) containing 18,356 C++ programs with humanauthored pseudocode and test cases. Under a budget of 100 program compilations, performing search improves the synthesis success rate over using the top-one translation of the pseudocode from 25.6% to 44.7%.
Researcher Affiliation	Academia	Department of Computer Science Stanford University {sumith,ppasupat,kach,minalee,padon,aaiken,pliang}@cs.stanford.edu
Pseudocode	Yes	Figure 1: Given L pseudocode lines x1:L (with indentation levels ℓ1:L) and public test cases, our task is to synthesize a program with code lines y1:L.
Open Source Code	Yes	The dataset and code are available at https://sumith1896.github.io/ spoc/.
Open Datasets	Yes	For evaluation, we collected and release a new dataset, SPo C (Search-based Pseudocode to Code) containing 18,356 C++ programs with humanauthored pseudocode and test cases. The dataset and code are available at https://sumith1896.github.io/ spoc/.
Dataset Splits	Yes	To evaluate the generalization on unseen problems and annotation styles, we created two test sets. We generated the ﬁrst test set TESTP by splitting based on problems: we held out 158 problems (23% out of 677 problems), which is equivalent to 1,820 programs (10.1% of all programs). The second test set TESTW is split by workers: we held out 7 workers (12% out of 59 workers), which is equivalent to 1,752 programs (9.7% of all programs, with 186 programs overlapping with TESTP). We used the remaining data for training and development (90:10 split).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU or GPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies	No	The paper mentions using 'Open NMT' for the translation model but does not specify any version numbers for Open NMT or other software libraries/dependencies.
Experiment Setup	Yes	(We use M = 100 for our experiments.) The model also assigns a probability pij = p(cij \| xi) for each candidate cij. We use βmul = 0.95 for the experiments. Given the candidate lists C1, . . . , CL, we can synthesize a program ˆy by picking a candidate ci,j[i] from each Ci (where j[i] {1, . . . , M}) and then concatenating them into a program. In our search algorithm, we iterate through programs ˆy in the descending order of probability p(ˆy) = QL i=1 pi,j[i].