Learning Program Embeddings to Propagate Feedback on Student Code

Authors: Chris Piech, Jonathan Huang, Andy Nguyen, Mike Phulsuksombati, Mehran Sahami, Leonidas Guibas

ICML 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We apply our algorithm to assessments from the Code.org Hour of Code and Stanford University s CS1 course, where we propagate human comments on student assignments to orders of magnitude more submissions. To evaluate our program embeddings we test our ability to amplify teacher feedback. We use real student data from the Code.org Hour of Code... We then show how the same approach can be used for submissions in Stanford University s Programming Methodologies course... As our results in Table 2 show, the NPM model achieves the best training accuracy (with 98%, 98% and 94% accuracy respectively, for the three problems).
Researcher Affiliation Academia Chris Piech PIECH@CS.STANFORD.EDU Jonathan Huang JONATHANHUANG@GOOGLE.COM Andy Nguyen TANONEV@CS.STANFORD.EDU Mike Phulsuksombati MIKEP15@CS.STANFORD.EDU Mehran Sahami SAHAMI@CS.STANFORD.EDU Leonidas Guibas GUIBAS@CS.STANFORD.EDU
Pseudocode No The paper describes algorithms and models in prose and mathematical equations but does not provide formal pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not provide a link to its source code or explicitly state that the code for its methodology is open-source or publicly available. It states that "The code.org dataset is available at code.org/research" but this is for a dataset, not the implemented methodology.
Open Datasets Yes To gather data, we exploit the fact that programs are executable that we can evaluate any piece of code on an arbitrary input (i.e., the precondition), and observe the state after, (the postcondition). For a program and its constituent parts we can thus collect arbitrarily many such precondition/postcondition mappings. This data provides the training set from which we can learn a shared representation for programs. ... We use real student data from the Code.org Hour of Code which has been attempted by over 27 million learners... We then show how the same approach can be used for submissions in Stanford University s Programming Methodologies course... The code.org dataset is available at code.org/research.
Dataset Splits Yes We split our observed Hoare triples into training and test sets and learn our NPM model using the training set. Then for each triple (P, A, Q) in the test set we measure how well we can predict the postcondition Q given the corresponding program A and precondition P.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments.
Software Dependencies No The paper mentions using random search (Bergstra & Bengio, 2012) and Adagrad (Duchi et al., 2011) for optimization, and refers to backpropagation, but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes We run joint optimization using minibatch stochastic gradient descent without momentum, using ordinary backpropagation to calculate the gradient. We use random search (Bergstra & Bengio, 2012) to optimize over hyperparameters (e.g, regularization parameters, matrix dimensions, and minibatch size). Learning rates are set using Adagrad (Duchi et al., 2011). We seed our parameters using a smart initialization in which we first learn an autoencoder on the state space, and perform a vector-valued ridge regression for each unique program to extract a matrix mapping the features of the precondition to the features of the postcondition.