reproducibilityindex.ai

Improving Generalization in Reinforcement Learning with Mixture Regularization

Authors: KAIXIN WANG, Bingyi Kang, Jie Shao, Jiashi Feng

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We verify its effectiveness on improving generalization by conducting extensive experiments on the large-scale Procgen benchmark. Results show mixreg outperforms the well-established baselines on unseen testing environments by a large margin.
Researcher Affiliation	Collaboration	Kaixin Wang1 Bingyi Kang1 Jie Shao2 Jiashi Feng1 1National University of Singapore 2Byte Dance AI Lab
Pseudocode	No	The paper describes mathematical formulations and algorithms conceptually but does not include a clearly labeled pseudocode or algorithm block.
Open Source Code	Yes	Code is available at https://github.com/kaixin96/mixreg.
Open Datasets	Yes	We evaluate mixreg on the recently introduced Procgen Benchmark [2]. [2] Cobbe, K., Hesse, C., Hilton, J., and Schulman, J. (2019). Leveraging procedural generation to benchmark reinforcement learning. ar Xiv preprint ar Xiv:1912.01588.
Dataset Splits	No	The paper describes training on 'a limited set of 500 levels' and evaluating on 'unseen levels at testing', indicating a training and test set. However, it does not explicitly mention or detail a validation set or a specific validation split for hyperparameter tuning or model selection.
Hardware Specification	No	The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory, cloud instances) used for running the experiments.
Software Dependencies	No	The paper mentions using specific algorithms like 'Proximal Policy Optimization (PPO) [18]' and 'Rainbow [8]' and a 'convolutional network architecture proposed in IMPALA [5]', but it does not specify software versions for these or other libraries/environments (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	No	The paper states, 'Hyperparameters, full training curves and other implementation details are provided in the supplementary material.' While it describes general aspects of the experimental setup (e.g., 500 level generalization protocol, choice of environments, algorithms, and network architecture), it explicitly defers concrete hyperparameter values and detailed training configurations to supplementary material, meaning they are not in the main text.