Risk-Aware Transfer in Reinforcement Learning using Successor Features
Authors: Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on a discrete navigation domain and control of a simulated robotic arm demonstrate the ability of Ra SFs to outperform alternative methods including SFs, when taking the risk of the learned policies into account.Empirical evaluations on discrete navigation and continuous robot control domains (Section 4) demonstrate the ability of Ra SFs to better manage the trade-off between return and risk and avoid catastrophic outcomes, while providing excellent generalization on novel tasks in the same domain. |
| Researcher Affiliation | Collaboration | Michael Gimelfarb University of Toronto mike.gimelfarb@mail.utoronto.ca André Barreto Deep Mind andrebarreto@google.com Scott Sanner University of Toronto ssanner@mie.utoronto.ca Chi-Guhn Lee University of Toronto cglee@mie.utoronto.ca |
| Pseudocode | No | The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'. |
| Open Source Code | No | The paper does not explicitly state that the source code for their methodology is released, nor does it provide a link to a code repository. |
| Open Datasets | Yes | To evaluate the performance of Ra SF, we revisit the benchmark domains in Barreto et al. [2], which have been slightly modified for learning and evaluating risk-aware behaviors.The second domain consists of a set of tasks based on the Mu Jo Co physics engine [42] that involve the maneuver of a robotic arm toward a fixed target location. |
| Dataset Splits | No | The paper mentions 'training' and 'test' tasks for the Reacher domain but does not provide specific details on validation splits or percentages. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications). |
| Software Dependencies | No | The paper mentions the 'Mu Jo Co physics engine' and 'C51' architecture but does not specify version numbers for these or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | No | We defer all experimental details to Appendix C. |