reproducibilityindex.ai

Synthesizing Tasks for Block-based Programming

Authors: Umair Ahmed, Maria Christakis, Aleksandr Efremov, Nigel Fernandez, Ahana Ghosh, Abhik Roychoudhury, Adish Singla

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our algorithm through an extensive empirical evaluation and user study on reference tasks taken from the Hour of Code: Classic Maze challenge by Code.org and the Intro to Programming with Karel course by Code HS.com.
Researcher Affiliation	Academia	1National University of Singapore, {umair, abhik}@comp.nus.edu.sg, 2MPI-SWS, {maria, aefremov, nfernand, gahana, adishs}@mpi-sws.org
Pseudocode	No	The paper describes the algorithm's pipeline and components in text and diagrams (Figure 3), but does not provide explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Our implementation is publicly available.2 2https://github.com/adishs/neurips2020_synthesizing-tasks_code
Open Datasets	Yes	The HOC tasks were selected from the Hour of Code: Classic Maze challenge by Code.org [23] and the Karel tasks from the Intro to Programming with Karel course by Code HS.com [22].
Dataset Splits	No	The paper uses existing reference tasks from Code.org and Code HS.com to generate new tasks. It does not describe any train/validation/test splits of these reference tasks for its own experimental setup or model training.
Hardware Specification	No	The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies	No	In our implementation, we used the Z3 solver [7]. The paper mentions Z3 solver but does not provide a specific version number (e.g., Z3 4.8.x).
Experiment Setup	Yes	As per Section 2, we set the following thresholds for our algorithm: (i) δsize = 2, (ii) δdiss = 0.33, and (iii) δqual = 0.2 for codes with While or Repeat Until, and 0.05 otherwise. We run MCTS 10 times per code, with each run generating one task. We set the maximum iterations of a run to 2 million (M) and the exploration constant to 2 [14].