Giving Feedback on Interactive Student Programs with Meta-Exploration

Authors: Evan Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on a set of over 700K real anonymized student programs from a Code.org interactive assignment. Our approach provides feedback with 94.3% accuracy, improving over existing approaches by 17.7% and coming within 1.5% of human-level accuracy.
Researcher Affiliation Academia Evan Zheran Liu , Moritz Stephan , Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn Stanford University
Pseudocode Yes Algorithm 1 Training episode for policy k
Open Source Code Yes Our code is publicly available at https://github.com/ezliu/dreamgrader, which includes Bounce as a new meta-RL testbed.
Open Datasets Yes Our experiments use a dataset of 711,274 real anonymized student submissions to this assignment, released by Nie et al. [30].
Dataset Splits No The paper states, "We use 0.5% of these programs for training, corresponding to N = 3,556 and uniformly sample from the remaining programs for testing," but does not explicitly mention a separate validation dataset split.
Hardware Specification Yes automatic grading with DREAMGRADER requires only 1 second per program on a single NVIDIA RTX 2080 GPU, which is 180 faster.
Software Dependencies No The paper mentions using PyTorch and Adam optimizer through citations but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Unless otherwise noted, we train 3 seeds of each automated approach for 5M steps on N = 3,556 training programs, consisting of 0.5% of the dataset.