Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Reinforcement Learning with Option Machines
Authors: Floris den Hengst, Vincent Francois-Lavet, Mark Hoogendoorn, Frank van Harmelen
IJCAI 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our approach in zero-shot, singleand multi-task settings in an environment with fully specified and underspecified instructions. We find that OMs perform significantly better than or comparable to the state-of-art in all environments and learning settings. |
| Researcher Affiliation | Collaboration | Floris den Hengst1,2 , Vincent Franc ois-Lavet2 , Mark Hoogendoorn2 , Frank van Harmelen2 1ING Bank N.V. 2Vrije Universiteit Amsterdam |
| Pseudocode | Yes | Algorithm 1 Main loop; Algorithm 2 Control with an Option Machine; Algorithm 3 Learning with Option Machines |
| Open Source Code | No | The paper does not contain any explicit statement about releasing source code or a link to a code repository for the described methodology. |
| Open Datasets | Yes | Two benchmark environments by [Andreas et al., 2017] are used to evaluate the approach. In the craft environment, items can be obtained by collecting resources such as wood and iron and combining them at workshop locations. In the maze environment, the agent must navigate a series of rooms with doors. |
| Dataset Splits | No | The paper describes a curriculum learning setup and mentions training on a set of tasks and evaluating on "held out tasks" for zero-shot settings, but it does not provide specific numerical train/validation/test dataset splits (e.g., percentages or sample counts) for the data used in experiments. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory, or cloud instances) used to run the experiments. |
| Software Dependencies | No | The paper mentions using "AC as the base learner" but does not specify any software libraries or tools with their version numbers (e.g., Python, PyTorch, TensorFlow versions) that were used for the implementation. It refers to an Appendix for more details, but this information is not in the main text. |
| Experiment Setup | Yes | Shaping reward hyperparameters ρ = 0 and ρ = 0.1 were selected for the maze and craft environment respectively. A detailed description of the environments, tasks, hyperparameters etc. can be found in the Appendix. |