Optimal Policies Tend To Seek Power

Authors: Alex Turner, Logan Smith, Rohin Shah, Andrew Critch, Prasad Tadepalli

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our contributions are threefold. First, we develop a formal theory of power-seeking... Second, we provide empirical evidence of power-seeking policies in tabular and deep reinforcement learning (RL) agents via a suite of Gridworld experiments... Our theoretical results explain why optimal policies tend to seek power, and our empirical demonstrations indicate that this phenomenon is already present in simple environments with current RL methods.
Researcher Affiliation Collaboration Alex Turner1 , Zachary Kent1 , Andrew Critch2 , Richard Ngo3 , David Lindner1 , Lawrence Chan1 , David Krueger4 , Jan Leike3 1 DeepMind, 2 UC Berkeley, 3 OpenAI, 4 University of Cambridge, Vector Institute, CIFAR
Pseudocode No No structured pseudocode or algorithm blocks are present in the paper.
Open Source Code No Code for the experiments and plots is available upon request.
Open Datasets Yes We consider a simple Gridworld environment... For our MiniGrid experiments, we use the MiniGrid library...
Dataset Splits No No specific dataset split information (percentages, sample counts, or detailed splitting methodology) is provided for train, validation, or test sets.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, processor types, memory amounts) used for running experiments are provided.
Software Dependencies No The paper mentions 'MiniGrid library' and 'JAX' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Appendix B: Experimental Details... Specifically, we use an Adam optimizer with a learning rate of 10−4 and a batch size of 32. The discount factor γ is 0.99. For the MiniGrid experiments, we train the agent for 200 million environment steps. For the tabular Gridworld experiments, we use a learning rate of 0.1 for Q-learning.