reproducibilityindex.ai

Observational Overfitting in Reinforcement Learning

Authors: Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL).
Researcher Affiliation	Collaboration	Xingyou Song , Yiding Jiang , Stephen Tu, Behnam Neyshabur Google {xingyousong,ydjiang,stephentu,neyshabur}@google.com Yilun Du MIT yilundu@mit.edu
Pseudocode	No	The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures).
Open Source Code	No	The paper only references third-party repositories for models or tools used, not explicit access to the authors' own source code for the methodology presented in the paper.
Open Datasets	Yes	We study observational overﬁtting with linear quadratic regulators (LQR) in a synthetic environment and neural networks such as multi-layer perceptrons (MLPs) and convolutions in classic Gym environments.
Dataset Splits	No	The paper mentions 'training levels' and 'test time' for environments like Gym and Coin Run (e.g., '10 training levels'), but does not provide specific percentages, sample counts, or detailed methodology for train/validation/test splits needed for reproduction.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing resources used for running experiments. It only vaguely mentions 'GPU'.
Software Dependencies	No	The paper mentions software components like 'TensorFlow' and 'PPO2' but does not provide specific version numbers for these or any other ancillary software dependencies, which are necessary for reproducibility.
Experiment Setup	Yes	A.3.4 PPO PARAMETERS For the projected gym tasks, we used for PPO2 Hyperparameters: PPO2 Hyperparameters Values nsteps 2048 nenvs 16 nminibatches 64 λ 0.95 γ 0.99 noptepochs 10 entropy 0.0 learning rate 3 10 4 vf coefﬁicent 0.5 max-grad-norm 0.5 total time steps Varying