PLANS: Neuro-Symbolic Program Learning from Videos
Authors: Raphaël Dang-Nhu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we report experimental observations concerning the performance of PLANS. Benchmarks Karel (Pattis, 1981) is an educational programming language that controls a robot navigating through a grid world with walls and markers. Vi ZDoom (Kempka et al., 2016) is an open-source platform for Doom, a classical first-person shooter game. We use the three evaluation metrics designed by Sun et al. (2018). To measure execution accuracy, we compare if the predicted and ground-truth programs behave similarly on a fixed number of not previously observed initial states. Sequence accuracy measures if the predicted and ground-truth programs match exactly. Program accuracy is similar to sequence accuracy, with identification of some semantically equivalent programs: e.g., repeat(3) : move and move : move : move will be considered as equivalent by program accuracy. Table 1: Accuracy comparison on the main Karel and Vi ZDoom benchmarks. |
| Researcher Affiliation | Academia | Raphaël Dang-Nhu ETH Zürich dangnhur@ethz.ch |
| Pseudocode | No | The paper describes algorithms in text (e.g., 'dynamic filtering algorithm') and refers to supplementary material for detailed description, but does not include structured pseudocode or algorithm blocks in the main text. |
| Open Source Code | Yes | We make our implementation public and provide additional details about the experiments duration in the supplementary material. |
| Open Datasets | Yes | Benchmarks Karel (Pattis, 1981) is an educational programming language that controls a robot navigating through a grid world with walls and markers. Vi ZDoom (Kempka et al., 2016) is an open-source platform for Doom, a classical first-person shooter game. We use the three evaluation metrics designed by Sun et al. (2018). |
| Dataset Splits | Yes | ϵa are ϵp are treated as hyperparameters and optimized on the validation dataset. |
| Hardware Specification | Yes | We performed all experiments on a machine with 2.00GHz Intel Xeon E52650 CPUs and using a single Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Rosette (Torlak, Bodik, 2013)' and 'Z3 solver (De Moura, Bjørner, 2008)', but does not provide specific version numbers for these or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | No | The paper states 'To ensure reproducibility of our results, extensive description of hyperparameters and training process can be found in the supplementary material.' This means specific experimental setup details like hyperparameter values are not provided in the main text. |