reproducibilityindex.ai

PLANS: Neuro-Symbolic Program Learning from Videos

Authors: Raphaël Dang-Nhu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we report experimental observations concerning the performance of PLANS. Benchmarks Karel (Pattis, 1981) is an educational programming language that controls a robot navigating through a grid world with walls and markers. Vi ZDoom (Kempka et al., 2016) is an open-source platform for Doom, a classical ﬁrst-person shooter game. We use the three evaluation metrics designed by Sun et al. (2018). To measure execution accuracy, we compare if the predicted and ground-truth programs behave similarly on a ﬁxed number of not previously observed initial states. Sequence accuracy measures if the predicted and ground-truth programs match exactly. Program accuracy is similar to sequence accuracy, with identiﬁcation of some semantically equivalent programs: e.g., repeat(3) : move and move : move : move will be considered as equivalent by program accuracy. Table 1: Accuracy comparison on the main Karel and Vi ZDoom benchmarks.
Researcher Affiliation	Academia	Raphaël Dang-Nhu ETH Zürich dangnhur@ethz.ch
Pseudocode	No	The paper describes algorithms in text (e.g., 'dynamic ﬁltering algorithm') and refers to supplementary material for detailed description, but does not include structured pseudocode or algorithm blocks in the main text.
Open Source Code	Yes	We make our implementation public and provide additional details about the experiments duration in the supplementary material.
Open Datasets	Yes	Benchmarks Karel (Pattis, 1981) is an educational programming language that controls a robot navigating through a grid world with walls and markers. Vi ZDoom (Kempka et al., 2016) is an open-source platform for Doom, a classical ﬁrst-person shooter game. We use the three evaluation metrics designed by Sun et al. (2018).
Dataset Splits	Yes	ϵa are ϵp are treated as hyperparameters and optimized on the validation dataset.
Hardware Specification	Yes	We performed all experiments on a machine with 2.00GHz Intel Xeon E52650 CPUs and using a single Ge Force RTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions 'Rosette (Torlak, Bodik, 2013)' and 'Z3 solver (De Moura, Bjørner, 2008)', but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup	No	The paper states 'To ensure reproducibility of our results, extensive description of hyperparameters and training process can be found in the supplementary material.' This means specific experimental setup details like hyperparameter values are not provided in the main text.