PLANS: Neuro-Symbolic Program Learning from Videos

Authors: Raphaël Dang-Nhu

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we report experimental observations concerning the performance of PLANS. Benchmarks Karel (Pattis, 1981) is an educational programming language that controls a robot navigating through a grid world with walls and markers. Vi ZDoom (Kempka et al., 2016) is an open-source platform for Doom, a classical first-person shooter game. We use the three evaluation metrics designed by Sun et al. (2018). To measure execution accuracy, we compare if the predicted and ground-truth programs behave similarly on a fixed number of not previously observed initial states. Sequence accuracy measures if the predicted and ground-truth programs match exactly. Program accuracy is similar to sequence accuracy, with identification of some semantically equivalent programs: e.g., repeat(3) : move and move : move : move will be considered as equivalent by program accuracy. Table 1: Accuracy comparison on the main Karel and Vi ZDoom benchmarks.
Researcher Affiliation Academia Raphaël Dang-Nhu ETH Zürich dangnhur@ethz.ch
Pseudocode No The paper describes algorithms in text (e.g., 'dynamic filtering algorithm') and refers to supplementary material for detailed description, but does not include structured pseudocode or algorithm blocks in the main text.
Open Source Code Yes We make our implementation public and provide additional details about the experiments duration in the supplementary material.
Open Datasets Yes Benchmarks Karel (Pattis, 1981) is an educational programming language that controls a robot navigating through a grid world with walls and markers. Vi ZDoom (Kempka et al., 2016) is an open-source platform for Doom, a classical first-person shooter game. We use the three evaluation metrics designed by Sun et al. (2018).
Dataset Splits Yes ϵa are ϵp are treated as hyperparameters and optimized on the validation dataset.
Hardware Specification Yes We performed all experiments on a machine with 2.00GHz Intel Xeon E52650 CPUs and using a single Ge Force RTX 2080 Ti GPU.
Software Dependencies No The paper mentions 'Rosette (Torlak, Bodik, 2013)' and 'Z3 solver (De Moura, Bjørner, 2008)', but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment.
Experiment Setup No The paper states 'To ensure reproducibility of our results, extensive description of hyperparameters and training process can be found in the supplementary material.' This means specific experimental setup details like hyperparameter values are not provided in the main text.