Reinforcement Learning with Option Machines

Authors: Floris den Hengst, Vincent Francois-Lavet, Mark Hoogendoorn, Frank van Harmelen

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare our approach in zero-shot, singleand multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings.
Researcher Affiliation Collaboration Floris den Hengst1,2 , Vincent Franc ois-Lavet2 , Mark Hoogendoorn2 , Frank van Harmelen2 1ING Bank N.V. 2Vrije Universiteit Amsterdam
Pseudocode Yes Algorithm 1 Main loop; Algorithm 2 Control with an Option Machine; Algorithm 3 Learning with Option Machines
Open Source Code No The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology.
Open Datasets Yes Two benchmark environments by [Andreas et al., 2017] are used to evaluate the approach. In the craft environment, items can be obtained by collecting resources such as wood and iron and combining them at workshop locations. In the maze environment, the agent must navigate a series of rooms with doors.
Dataset Splits No The paper describes a curriculum learning setup and mentions training on a set of tasks and evaluating on "held out tasks" for zero-shot settings, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used in experiments.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to run the experiments.
Software Dependencies No The paper mentions using "AC as the base learner" but does not specify any software libraries or tools with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that were used for the implementation. It refers to an Appendix for more details, but this information is not in the main text.
Experiment Setup Yes Shaping reward hyperparameters ρ = 0 and ρ = 0.1 were selected for the maze and craft environment respectively. A detailed description of the environments, tasks, hyperparameters etc. can be found in the Appendix.