Observational Overfitting in Reinforcement Learning
Authors: Xingyou Song, Yiding Jiang, Stephen Tu, Yilun Du, Behnam Neyshabur
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments expose intriguing properties especially with regards to implicit regularization, and also corroborate results from previous works in RL generalization and supervised learning (SL). |
| Researcher Affiliation | Collaboration | Xingyou Song , Yiding Jiang , Stephen Tu, Behnam Neyshabur Google {xingyousong,ydjiang,stephentu,neyshabur}@google.com Yilun Du MIT yilundu@mit.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures). |
| Open Source Code | No | The paper only references third-party repositories for models or tools used, not explicit access to the authors' own source code for the methodology presented in the paper. |
| Open Datasets | Yes | We study observational overfitting with linear quadratic regulators (LQR) in a synthetic environment and neural networks such as multi-layer perceptrons (MLPs) and convolutions in classic Gym environments. |
| Dataset Splits | No | The paper mentions 'training levels' and 'test time' for environments like Gym and Coin Run (e.g., '10 training levels'), but does not provide specific percentages, sample counts, or detailed methodology for train/validation/test splits needed for reproduction. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory specifications, or types of computing resources used for running experiments. It only vaguely mentions 'GPU'. |
| Software Dependencies | No | The paper mentions software components like 'TensorFlow' and 'PPO2' but does not provide specific version numbers for these or any other ancillary software dependencies, which are necessary for reproducibility. |
| Experiment Setup | Yes | A.3.4 PPO PARAMETERS For the projected gym tasks, we used for PPO2 Hyperparameters: PPO2 Hyperparameters Values nsteps 2048 nenvs 16 nminibatches 64 λ 0.95 γ 0.99 noptepochs 10 entropy 0.0 learning rate 3 10 4 vf coeffiicent 0.5 max-grad-norm 0.5 total time steps Varying |