reproducibilityindex.ai

Giving Feedback on Interactive Student Programs with Meta-Exploration

Authors: Evan Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on a set of over 700K real anonymized student programs from a Code.org interactive assignment. Our approach provides feedback with 94.3% accuracy, improving over existing approaches by 17.7% and coming within 1.5% of human-level accuracy.
Researcher Affiliation	Academia	Evan Zheran Liu , Moritz Stephan , Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn Stanford University
Pseudocode	Yes	Algorithm 1 Training episode for policy k
Open Source Code	Yes	Our code is publicly available at https://github.com/ezliu/dreamgrader, which includes Bounce as a new meta-RL testbed.
Open Datasets	Yes	Our experiments use a dataset of 711,274 real anonymized student submissions to this assignment, released by Nie et al. [30].
Dataset Splits	No	The paper states, "We use 0.5% of these programs for training, corresponding to N = 3,556 and uniformly sample from the remaining programs for testing," but does not explicitly mention a separate validation dataset split.
Hardware Specification	Yes	automatic grading with DREAMGRADER requires only 1 second per program on a single NVIDIA RTX 2080 GPU, which is 180 faster.
Software Dependencies	No	The paper mentions using PyTorch and Adam optimizer through citations but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Unless otherwise noted, we train 3 seeds of each automated approach for 5M steps on N = 3,556 training programs, consisting of 0.5% of the dataset.