Hierarchies of Reward Machines
Authors: Daniel Furelos-Blanco, Mark Law, Anders Jonsson, Krysia Broda, Alessandra Russo
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experimental Results |
| Researcher Affiliation | Collaboration | 1Imperial College London, UK 2ILASP Limited, UK 3Universitat Pompeu Fabra, Spain. |
| Pseudocode | Yes | We refer the reader to Appendix B.2 for the pseudo-code and step-by-step examples. |
| Open Source Code | Yes | The code is available at https://github. com/ertsiger/hrm-learning. |
| Open Datasets | Yes | The CRAFTWORLD domain (cf. Figure 1a) is used as a running example. In this domain, the agent ( ) can move forward or rotate 90 , staying put if it moves towards a wall. Locations are labeled with propositions from P = { , , , , , , , , , }. ... WATERWORLD (Karpathy, 2015; Sidor, 2016; Toro Icarte et al., 2018) consists of a 2D box containing 12 balls of 6 different colors (2 per color) each moving at a constant speed in a fixed direction. |
| Dataset Splits | No | The paper describes using multiple runs and instances for evaluation and a curriculum learning approach for tasks, but it does not specify explicit train/validation/test dataset splits with percentages or counts for reproduction in a traditional supervised learning sense. It focuses on task-instance pairs and average returns across episodes and runs. |
| Hardware Specification | Yes | All timed experiments ran on 3.40GHz Intel Core i7-6700 processors, while non-timed experiments have also run on 2.90GHz Intel Core i7-10700, 4.20GHz Intel Core i7-7700K, and 3.20GHz Intel Core i7-8700 processors. |
| Software Dependencies | No | The paper mentions software components and algorithms like 'Deep Q-networks', 'Double DQNs', and 'RMSprop', but it does not provide specific version numbers for software libraries, frameworks, or programming languages (e.g., PyTorch 1.x, Python 3.x). |
| Experiment Setup | Yes | Table 2: List of hyperparameters and their values. |