Redeeming intrinsic rewards via constrained optimization

Authors: Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Consistent performance gains across sixty-one ATARI games validate our claim.
Researcher Affiliation Collaboration Eric Chen , Zhang-Wei Hong * , Joni Pajarinen & Pulkit Agrawal Improbable AI Lab, Massachusetts Institute of Technology MIT-IBM Watson AI Lab Aalto University NSF AI Institute for AI and Fundamental Interactions (IAIFI)
Pseudocode Yes Algorithm 1 Extrinsic-Intrinsic Policy Optimization (EIPO)
Open Source Code Yes The code is available at https://github.com/Improbable-AI/eipo.
Open Datasets Yes We conducted experiments on ATARI games [20], the de-facto benchmark for exploration methods [4, 11].
Dataset Splits No The paper uses standard ATARI benchmarks but does not explicitly detail the train/validation/test splits (e.g., percentages or specific counts) for reproducibility.
Hardware Specification Yes When working with image inputs (e.g., ATARI), sharing the convolutional neural network (CNN) backbone between E and E+I helps save memory, which is important when using GPUs (in our case, an NVIDIA RTX 3090Ti).
Software Dependencies No The paper mentions using PPO [13] and Pycolab [19], but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Pseudo-code can be found in Algorithm 2, and full implementation details including hyperparameters can be found in Appendix A.2.