reproducibilityindex.ai

Skill Machines: Temporal Logic Skill Composition in Reinforcement Learning

Authors: Geraud Nangue Tasse, Devon Jarvis, Steven James, Benjamin Rosman

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	3. Emperical and qualitative results: We demonstrate our approach in several environments, including a high-dimensional video game and a continuous control environment. Our results indicate that our method is capable of producing near-optimal to optimal behaviour for a variety of long-horizon tasks without further learning, including empirical results that far surpass all the representative state-of-the-art baselines. and 4 EXPERIMENTS We evaluate our approach in three domains, including a high-dimensional, continuous control task. In particular, we consider the Office Gridworld (Figure A2a), the Moving Targets domain (Figure A1) and the Safety Gym domain (Figure 1).
Researcher Affiliation	Academia	Geraud Nangue Tasse, Devon Jarvis, Steven James & Benjamin Rosman School of Computer Science and Applied Mathematics University of the Witwatersrand Johannesburg, South Africa {geraud.nanguetasse1, devon.jarvis, steven.james, benjamin.rosman1}@wits.ac.za
Pseudocode	Yes	A.2 FULL PSEUDO-CODES OF FRAMEWORK Algorithm 1: Q-learning for skill primitives ... Algorithm 2: Skill machine from reward machine ... Algorithm 3: Zero-shot and Few-shot Q-learning with skill machines
Open Source Code	No	The paper does not provide an explicit statement or link indicating that the source code for the described methodology is publicly available.
Open Datasets	Yes	4 EXPERIMENTS We evaluate our approach in three domains, including a high-dimensional, continuous control task. In particular, we consider the Office Gridworld (Figure A2a), the Moving Targets domain (Figure A1) and the Safety Gym domain (Figure 1). and Office Gridworld (Icarte et al., 2022): and Moving Targets Domain (Nangue Tasse et al., 2020): and Safety Gym Domain (Ray et al., 2019):
Dataset Splits	No	The paper does not explicitly detail dataset splits (e.g., percentages or sample counts) for training, validation, and test sets. It mentions evaluation during training but not specific data partitioning.
Hardware Specification	No	Computations were performed using the High Performance Computing Infrastructure provided by the Mathematical Sciences Support unit at the University of the Witwatersrand. (This statement is too general and does not specify concrete hardware details such as GPU models, CPU types, or memory.)
Software Dependencies	No	The paper references various algorithms and frameworks (e.g., 'Defaults of Mnih et al. (2015)', 'Defaults of Achiam (2018)') but does not provide specific version numbers for software libraries, programming languages (other than general mentions), or specific solvers required for reproducibility.
Experiment Setup	Yes	The full list of hyper-parameters for the Office World, Moving Targets and Safe AI Gym domain experiments are shown in Tables A1-A3 respectively. (These tables specify values such as Timesteps, exploration epsilon, Discount Factor, MLP hidden layers, etc.)