reproducibilityindex.ai

Reinforcement Learning Under Moral Uncertainty

Authors: Adrien Ecoffet, Joel Lehman

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The results illustrate (1) how such uncertainty can help curb extreme behavior from commitment to single theories and (2) several technical complications arising from attempting to ground moral philosophy in RL (e.g. how can a principled trade-off between two competing but incomparable reward functions be reached). We now illustrate various properties of the voting systems for moral uncertainty introduced in this work, and in particular focus on the trade-offs that exist between them. The code for all the experiments presented in this section can be found at https://github.com/uber-research/normative-uncertainty.
Researcher Affiliation	Industry	1Uber AI Labs, San Francisco, CA, USA 2Open AI, San Francisco, CA, USA (work done at Uber AI Labs).
Pseudocode	Yes	In our implementation, ϵ is annealed to 0 by the end of training (SI E.1). We call this algorithm Variance-SARSA (pseudocode is provided in the SI).
Open Source Code	Yes	The code for all the experiments presented in this section can be found at https://github.com/uber-research/normative-uncertainty.
Open Datasets	No	Our experiments are based on four related gridworld environments (Fig. 1) that tease out differences between various voting systems. These environments are derived from the trolley problem (Foot, 1967), commonly used within moral philosophy to highlight moral intuitions and conﬂicts between ethical theories.
Dataset Splits	No	All the experiments in this work use short, episodic environments, allowing us to set γi = 1 (i.e. undiscounted rewards) across all of them for simplicity.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) are provided for the experimental setup.
Software Dependencies	No	No specific software dependencies with version numbers are listed in the paper.
Experiment Setup	Yes	In our implementation, ϵ is annealed to 0 by the end of training (SI E.1). where ε is a small constant (10 6 in our experiments) to handle theories with σ2 i = 0.