Risk-Aware Transfer in Reinforcement Learning using Successor Features

Authors: Michael Gimelfarb, Andre Barreto, Scott Sanner, Chi-Guhn Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on a discrete navigation domain and control of a simulated robotic arm demonstrate the ability of Ra SFs to outperform alternative methods including SFs, when taking the risk of the learned policies into account.Empirical evaluations on discrete navigation and continuous robot control domains (Section 4) demonstrate the ability of Ra SFs to better manage the trade-off between return and risk and avoid catastrophic outcomes, while providing excellent generalization on novel tasks in the same domain.
Researcher Affiliation Collaboration Michael Gimelfarb University of Toronto mike.gimelfarb@mail.utoronto.ca André Barreto Deep Mind andrebarreto@google.com Scott Sanner University of Toronto ssanner@mie.utoronto.ca Chi-Guhn Lee University of Toronto cglee@mie.utoronto.ca
Pseudocode No The paper does not contain any sections or figures explicitly labeled 'Pseudocode' or 'Algorithm'.
Open Source Code No The paper does not explicitly state that the source code for their methodology is released, nor does it provide a link to a code repository.
Open Datasets Yes To evaluate the performance of Ra SF, we revisit the benchmark domains in Barreto et al. [2], which have been slightly modified for learning and evaluating risk-aware behaviors.The second domain consists of a set of tasks based on the Mu Jo Co physics engine [42] that involve the maneuver of a robotic arm toward a fixed target location.
Dataset Splits No The paper mentions 'training' and 'test' tasks for the Reacher domain but does not provide specific details on validation splits or percentages.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies No The paper mentions the 'Mu Jo Co physics engine' and 'C51' architecture but does not specify version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup No We defer all experimental details to Appendix C.