A Boolean Task Algebra for Reinforcement Learning
Authors: Geraud Nangue Tasse, Steven James, Benjamin Rosman
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify our approach in two domains including a high-dimensional video game environment requiring function approximation where an agent first learns a set of base skills, and then composes them to solve a super-exponential number of new tasks. We illustrate our approach in the Four Rooms domain (Sutton et al., 1999), where an agent must navigate a grid world to a particular location. We then demonstrate composition in a high-dimensional video game environment, where an agent first learns to collect different objects, and then composes these abilities to solve complex tasks immediately. Our results show that, even when function approximation is required, an agent can leverage its existing skills to solve new tasks without further learning. |
| Researcher Affiliation | Academia | Geraud Nangue Tasse, Steven James, Benjamin Rosman School of Computer Science and Applied Mathematics University of the Witwatersrand Johannesburg, South Africa |
| Pseudocode | Yes | The full pseudocode is listed in the supplementary material. |
| Open Source Code | No | The paper does not provide an explicit statement or link to the source code for the methodology described in this paper. |
| Open Datasets | No | The paper describes the environments used, such as the "Four Rooms domain (Sutton et al., 1999)" and a "video game environment as Van Niekerk et al. (2019)", which are conceptual or from prior work. However, it does not provide concrete access information (link, DOI, repository, or formal citation with authors/year for a dataset) for any specific dataset used for training in this paper. |
| Dataset Splits | No | The paper does not provide specific dataset split information (percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (like exact GPU/CPU models or processor types) used for running its experiments. |
| Software Dependencies | No | The paper mentions modifying Q-learning and deep Q-learning but does not provide specific software names with version numbers (e.g., library versions like PyTorch 1.9, TensorFlow 2.x). |
| Experiment Setup | No | While the paper states that "The hyperparameters and network architecture are listed in the supplementary material" in Section 5, it does not provide these specific experimental setup details (concrete hyperparameter values, training configurations, or system-level settings) in the main text of the paper. |