Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning
Authors: Geraud Nangue Tasse, Devon Jarvis, Steven James, Benjamin Rosman
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3. Emperical and qualitative results: We demonstrate our approach in several environments, including a high-dimensional video game and a continuous control environment. Our results indicate that our method is capable of producing near-optimal to optimal behaviour for a variety of long-horizon tasks without further learning, including empirical results that far surpass all the representative state-of-the-art baselines. and 4 EXPERIMENTS We evaluate our approach in three domains, including a high-dimensional, continuous control task. In particular, we consider the Office Gridworld (Figure A2a), the Moving Targets domain (Figure A1) and the Safety Gym domain (Figure 1). |
| Researcher Affiliation | Academia | Geraud Nangue Tasse, Devon Jarvis, Steven James & Benjamin Rosman School of Computer Science and Applied Mathematics University of the Witwatersrand Johannesburg, South Africa {geraud.nanguetasse1, devon.jarvis, steven.james, benjamin.rosman1}@wits.ac.za |
| Pseudocode | Yes | A.2 FULL PSEUDO-CODES OF FRAMEWORK Algorithm 1: Q-learning for skill primitives ... Algorithm 2: Skill machine from reward machine ... Algorithm 3: Zero-shot and Few-shot Q-learning with skill machines |
| Open Source Code | No | The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available. |
| Open Datasets | Yes | 4 EXPERIMENTS We evaluate our approach in three domains, including a high-dimensional, continuous control task. In particular, we consider the Office Gridworld (Figure A2a), the Moving Targets domain (Figure A1) and the Safety Gym domain (Figure 1). and Office Gridworld (Icarte et al., 2022): and Moving Targets Domain (Nangue Tasse et al., 2020): and Safety Gym Domain (Ray et al., 2019): |
| Dataset Splits | No | The paper does not explicitly detail dataset splits (e.g., percentages or sample counts) for training, validation, and test sets. It mentions evaluation during training but not specific data partitioning. |
| Hardware Specification | No | Computations were performed using the High Performance Computing Infrastructure provided by the Mathematical Sciences Support unit at the University of the Witwatersrand. (This statement is too general and does not specify concrete hardware details such as GPU models, CPU types, or memory.) |
| Software Dependencies | No | The paper references various algorithms and frameworks (e.g., 'Defaults of Mnih et al. (2015)', 'Defaults of Achiam (2018)') but does not provide specific version numbers for software libraries, programming languages (other than general mentions), or specific solvers required for reproducibility. |
| Experiment Setup | Yes | The full list of hyper-parameters for the Office World, Moving Targets and Safe AI Gym domain experiments are shown in Tables A1-A3 respectively. (These tables specify values such as Timesteps, exploration epsilon, Discount Factor, MLP hidden layers, etc.) |