Goal Misgeneralization in Deep Reinforcement Learning

Authors: Lauro Langosco Di Langosco, Jack Koch, Lee D Sharkey, Jacob Pfau, David Krueger

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We provide the first empirical demonstrations of goal misgeneralization to highlight and illustrate this phenomenon. We experimentally demonstrate that goal misgeneralization can be a significant issue, even when capability generalization failures are rare.
Researcher Affiliation Academia 1University of Cambridge 2University of T ubingen 3University of Edinburgh.
Pseudocode No The paper describes the methods used (e.g., PPO) but does not provide structured pseudocode blocks or algorithms.
Open Source Code Yes Our code can be found at https://github.com/ Jacob Pfau/procgen AISC (Environments) and https:// github.com/jbkjr/train-procgen-pytorch (Training).
Open Datasets Yes Except in Section 3.5, all environments are adapted from the Procgen environment suite (Cobbe et al., 2019).
Dataset Splits No The paper describes training and test environments and mentions 'validation performance' in one figure, but it does not provide specific percentages, sample counts, or detailed methodology for dataset splits (e.g., train/validation/test splits).
Hardware Specification Yes Each training run required approximately 30 GPU hours of compute on a V100.
Software Dependencies No The paper mentions 'PyTorch (Paszke et al., 2019)' but does not provide explicit version numbers for this or any other software dependency within the text.
Experiment Setup Yes Table 2. Hyperparameters ENV. DISTRIBUTION MODE HARD γ .999 λ .95 LEARNING RATE 0.0005 # TIMESTEPS PER ROLLOUT 256 EPOCHS PER ROLLOUT 3 # MINIBATCHES PER EPOCH 8 MINIBATCH SIZE 2048 ENTROPY BONUS (k H) .01 PPO CLIP RANGE .2 REWARD NORMALIZATION? YES LEARNING RATE 5 10 4 # WORKERS 4 # ENVIRONMENTS PER WORKER 64 TOTAL TIMESTEPS 200M ARCHITECTURE Impala LSTM? No FRAME STACK? No