Synthesizing Tasks for Block-based Programming

Authors: Umair Ahmed, Maria Christakis, Aleksandr Efremov, Nigel Fernandez, Ahana Ghosh, Abhik Roychoudhury, Adish Singla

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the effectiveness of our algorithm through an extensive empirical evaluation and user study on reference tasks taken from the Hour of Code: Classic Maze challenge by Code.org and the Intro to Programming with Karel course by Code HS.com.
Researcher Affiliation Academia 1National University of Singapore, {umair, abhik}@comp.nus.edu.sg, 2MPI-SWS, {maria, aefremov, nfernand, gahana, adishs}@mpi-sws.org
Pseudocode No The paper describes the algorithm's pipeline and components in text and diagrams (Figure 3), but does not provide explicit pseudocode or algorithm blocks.
Open Source Code Yes Our implementation is publicly available.2 2https://github.com/adishs/neurips2020_synthesizing-tasks_code
Open Datasets Yes The HOC tasks were selected from the Hour of Code: Classic Maze challenge by Code.org [23] and the Karel tasks from the Intro to Programming with Karel course by Code HS.com [22].
Dataset Splits No The paper uses existing reference tasks from Code.org and Code HS.com to generate new tasks. It does not describe any train/validation/test splits of these reference tasks for its own experimental setup or model training.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No In our implementation, we used the Z3 solver [7]. The paper mentions Z3 solver but does not provide a specific version number (e.g., Z3 4.8.x).
Experiment Setup Yes As per Section 2, we set the following thresholds for our algorithm: (i) δsize = 2, (ii) δdiss = 0.33, and (iii) δqual = 0.2 for codes with While or Repeat Until, and 0.05 otherwise. We run MCTS 10 times per code, with each run generating one task. We set the maximum iterations of a run to 2 million (M) and the exploration constant to 2 [14].