Negotiable Reinforcement Learning for Pareto Optimal Sequential Decision-Making
Authors: Nishant Desai, Andrew Critch, Stuart J. Russell
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 4, we implement a simple NRL agent and make empirical observations of this bet settling behavior. Our experiments are run in a modified version of the Frozen Lake environment in Open AI Gym [Brockman et al., 2016]. |
| Researcher Affiliation | Academia | Nishant Desai Center for Human-Compatible AI University of California, Berkeley nishantdesai@berkeley.edu Andrew Critch Department of EECS University of California, Berkeley critch@berkeley.edu Stuart Russell Computer Science Division University of California, Berkeley russell@cs.berkeley.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statement about open-sourcing the code for the methodology or a link to a code repository. |
| Open Datasets | Yes | Our experiments are run in a modified version of the Frozen Lake environment in Open AI Gym [Brockman et al., 2016]. |
| Dataset Splits | No | The paper describes a reinforcement learning setup using a modified Frozen Lake environment. It does not specify explicit training, validation, and test dataset splits as typically found in supervised learning contexts. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, or memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Open AI Gym [Brockman et al., 2016]' and 'point-based value iteration [Pineau et al., 2003]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | After running point-based value iteration [Pineau et al., 2003] with a belief set of 331 points, we execute the resulting policy in this environment. The agent is initialized with initial belief state w1, corresponding to a subjective belief that the agent is in Principal 1 s MDP, M1, with probability w1 and Principal 2 s MDP, M2, with probability 1 w1 = w2. |