Reinforcement Learning with Option Machines
Authors: Floris den Hengst, Vincent Francois-Lavet, Mark Hoogendoorn, Frank van Harmelen
IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our approach in zero-shot, singleand multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings. |
| Researcher Affiliation | Collaboration | Floris den Hengst1,2 , Vincent Franc ois-Lavet2 , Mark Hoogendoorn2 , Frank van Harmelen2 1ING Bank N.V. 2Vrije Universiteit Amsterdam |
| Pseudocode | Yes | Algorithm 1 Main loop; Algorithm 2 Control with an Option Machine; Algorithm 3 Learning with Option Machines |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Two benchmark environments by [Andreas et al., 2017] are used to evaluate the approach. In the craft environment, items can be obtained by collecting resources such as wood and iron and combining them at workshop locations. In the maze environment, the agent must navigate a series of rooms with doors. |
| Dataset Splits | No | The paper describes a curriculum learning setup and mentions training on a set of tasks and evaluating on "held out tasks" for zero-shot settings, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used in experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions using "AC as the base learner" but does not specify any software libraries or tools with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that were used for the implementation. It refers to an Appendix for more details, but this information is not in the main text. |
| Experiment Setup | Yes | Shaping reward hyperparameters ρ = 0 and ρ = 0.1 were selected for the maze and craft environment respectively. A detailed description of the environments, tasks, hyperparameters etc. can be found in the Appendix. |