Learning Robust Options
Authors: Daniel Mankowitz, Timothy Mann, Pierre-Luc Bacon, Doina Precup, Shie Mannor
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present experimental results which suggest that policy iteration with linear features may have an inherent form of robustness when using coarse feature representations. In addition, we present experimental results which demonstrate that robustness helps policy iteration implemented on top of deep neural networks to generalize over a much broader range of dynamics than non-robust policy iteration. |
| Researcher Affiliation | Collaboration | 1 Technion Israel Institute of Technology, Haifa, Israel 2 Google Deepmind, London, UK 3 Mc Gill University, Montreal, Canada |
| Pseudocode | Yes | Algorithm 1 ROPI |
| Open Source Code | No | The paper does not contain any explicit statements about the release of open-source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We performed the experiments in two, well-known continuous domains called Cart Pole and Acrobot 7. https://gym.openai.com/ |
| Dataset Splits | No | The paper describes training on a 'nominal model' and evaluating performance over '100 episodes per parameter setting' for the simulated environments. However, it does not provide specific training/validation/test dataset splits in terms of percentages, sample counts, or explicit partitioning methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions software components like 'Deep Q Network' and 'ADAM optimizer', and uses OpenAI Gym (implicitly Python-based), but it does not specify any version numbers for programming languages, libraries, or frameworks used. |
| Experiment Setup | Yes | The single network we use for each experiment is a DQN variant consisting of 3 fully-connected hidden layers with 128 weights per layer and Re Lu activations. The hyper-parameter values can be found in the Appendix. We optimize the DQN loss function using the ADAM optimizer for a maximum of 3000 episodes (unless the tasks are solved earlier). |