Reinforcement Learning Under Moral Uncertainty
Authors: Adrien Ecoffet, Joel Lehman
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The results illustrate (1) how such uncertainty can help curb extreme behavior from commitment to single theories and (2) several technical complications arising from attempting to ground moral philosophy in RL (e.g. how can a principled trade-off between two competing but incomparable reward functions be reached). We now illustrate various properties of the voting systems for moral uncertainty introduced in this work, and in particular focus on the trade-offs that exist between them. The code for all the experiments presented in this section can be found at https://github.com/uber-research/normative-uncertainty. |
| Researcher Affiliation | Industry | 1Uber AI Labs, San Francisco, CA, USA 2Open AI, San Francisco, CA, USA (work done at Uber AI Labs). |
| Pseudocode | Yes | In our implementation, ϵ is annealed to 0 by the end of training (SI E.1). We call this algorithm Variance-SARSA (pseudocode is provided in the SI). |
| Open Source Code | Yes | The code for all the experiments presented in this section can be found at https://github.com/uber-research/normative-uncertainty. |
| Open Datasets | No | Our experiments are based on four related gridworld environments (Fig. 1) that tease out differences between various voting systems. These environments are derived from the trolley problem (Foot, 1967), commonly used within moral philosophy to highlight moral intuitions and conflicts between ethical theories. |
| Dataset Splits | No | All the experiments in this work use short, episodic environments, allowing us to set γi = 1 (i.e. undiscounted rewards) across all of them for simplicity. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) are provided for the experimental setup. |
| Software Dependencies | No | No specific software dependencies with version numbers are listed in the paper. |
| Experiment Setup | Yes | In our implementation, ϵ is annealed to 0 by the end of training (SI E.1). where ε is a small constant (10 6 in our experiments) to handle theories with σ2 i = 0. |