Improving Generalization in Reinforcement Learning with Mixture Regularization
Authors: KAIXIN WANG, Bingyi Kang, Jie Shao, Jiashi Feng
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We verify its effectiveness on improving generalization by conducting extensive experiments on the large-scale Procgen benchmark. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin. |
| Researcher Affiliation | Collaboration | Kaixin Wang1 Bingyi Kang1 Jie Shao2 Jiashi Feng1 1National University of Singapore 2Byte Dance AI Lab |
| Pseudocode | No | The paper describes mathematical formulations and algorithms conceptually but does not include a clearly labeled pseudocode or algorithm block. |
| Open Source Code | Yes | Code is available at https://github.com/kaixin96/mixreg. |
| Open Datasets | Yes | We evaluate mixreg on the recently introduced Procgen Benchmark [2]. [2] Cobbe, K., Hesse, C., Hilton, J., and Schulman, J. (2019). Leveraging procedural generation to benchmark reinforcement learning. ar Xiv preprint ar Xiv:1912.01588. |
| Dataset Splits | No | The paper describes training on 'a limited set of 500 levels' and evaluating on 'unseen levels at testing', indicating a training and test set. However, it does not explicitly mention or detail a validation set or a specific validation split for hyperparameter tuning or model selection. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions using specific algorithms like 'Proximal Policy Optimization (PPO) [18]' and 'Rainbow [8]' and a 'convolutional network architecture proposed in IMPALA [5]', but it does not specify software versions for these or other libraries/environments (e.g., Python, PyTorch, TensorFlow versions). |
| Experiment Setup | No | The paper states, 'Hyperparameters, full training curves and other implementation details are provided in the supplementary material.' While it describes general aspects of the experimental setup (e.g., 500 level generalization protocol, choice of environments, algorithms, and network architecture), it explicitly defers concrete hyperparameter values and detailed training configurations to supplementary material, meaning they are not in the main text. |