Giving Feedback on Interactive Student Programs with Meta-Exploration
Authors: Evan Liu, Moritz Stephan, Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our approach on a set of over 700K real anonymized student programs from a Code.org interactive assignment. Our approach provides feedback with 94.3% accuracy, improving over existing approaches by 17.7% and coming within 1.5% of human-level accuracy. |
| Researcher Affiliation | Academia | Evan Zheran Liu , Moritz Stephan , Allen Nie, Chris Piech, Emma Brunskill, Chelsea Finn Stanford University |
| Pseudocode | Yes | Algorithm 1 Training episode for policy k |
| Open Source Code | Yes | Our code is publicly available at https://github.com/ezliu/dreamgrader, which includes Bounce as a new meta-RL testbed. |
| Open Datasets | Yes | Our experiments use a dataset of 711,274 real anonymized student submissions to this assignment, released by Nie et al. [30]. |
| Dataset Splits | No | The paper states, "We use 0.5% of these programs for training, corresponding to N = 3,556 and uniformly sample from the remaining programs for testing," but does not explicitly mention a separate validation dataset split. |
| Hardware Specification | Yes | automatic grading with DREAMGRADER requires only 1 second per program on a single NVIDIA RTX 2080 GPU, which is 180 faster. |
| Software Dependencies | No | The paper mentions using PyTorch and Adam optimizer through citations but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Unless otherwise noted, we train 3 seeds of each automated approach for 5M steps on N = 3,556 training programs, consisting of 0.5% of the dataset. |