Active Exploration for Learning Symbolic Representations

Authors: Garrett Andersen, George Konidaris

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that our algorithm outperforms random and greedy exploration policies on two different computer game domains. The first domain is an Asteroids-inspired game with complex dynamics but basic logical structure. The second is the Treasure Game, with simpler dynamics but more complex logical structure. [...] Simulation results for the Asteroids domain. Each bar represents the average of 100 runs. The error bars represent a 99% confidence interval for the mean. (a), (b), (c): The fraction of time that the agent spends on asteroids 1, 3, and 4, respectively. The greedy and random exploration policies spend significantly more time than our algorithm exploring asteroid 1 and significantly less time exploring asteroids 3 and 4. (d): The number of symbolic transitions that the agent has not observed (out of 115 possible). The greedy and random policies require 2-3 times as many option executions to match the performance of our algorithm. (Figure 3)
Researcher Affiliation Collaboration Garrett Andersen PROWLER.io Cambridge, United Kingdom garrett@prowler.io George Konidaris Department of Computer Science Brown University gdk@cs.brown.edu
Pseudocode Yes Algorithm 1 Fast Construction of a Distribution over Symbolic Option Models [...] Algorithm 2 Optimal Exploration
Open Source Code No The paper does not provide any information or links regarding open-source code for the described methodology.
Open Datasets No The paper introduces two custom game domains (Asteroids and Treasure Game) for experiments but does not provide access information (link, citation, repository) for the data collected within these domains, nor does it refer to them as publicly available datasets.
Dataset Splits No The paper describes its experimental setup including running simulations for 100 runs, but it does not specify any training, validation, or test dataset splits in the traditional sense, as the data is collected dynamically during exploration in the game environments.
Hardware Specification No The paper does not provide any specific details about the hardware used to run the experiments (e.g., CPU, GPU models, memory, or cloud instances).
Software Dependencies No The paper mentions 'pybox2d' as a physics simulator but does not provide version numbers for this or any other software dependency.
Experiment Setup Yes The hyperparameter settings that we use for our algorithm are given in Appendix A.