Redeeming intrinsic rewards via constrained optimization
Authors: Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Consistent performance gains across sixty-one ATARI games validate our claim. |
| Researcher Affiliation | Collaboration | Eric Chen , Zhang-Wei Hong * , Joni Pajarinen & Pulkit Agrawal Improbable AI Lab, Massachusetts Institute of Technology MIT-IBM Watson AI Lab Aalto University NSF AI Institute for AI and Fundamental Interactions (IAIFI) |
| Pseudocode | Yes | Algorithm 1 Extrinsic-Intrinsic Policy Optimization (EIPO) |
| Open Source Code | Yes | The code is available at https://github.com/Improbable-AI/eipo. |
| Open Datasets | Yes | We conducted experiments on ATARI games [20], the de-facto benchmark for exploration methods [4, 11]. |
| Dataset Splits | No | The paper uses standard ATARI benchmarks but does not explicitly detail the train/validation/test splits (e.g., percentages or specific counts) for reproducibility. |
| Hardware Specification | Yes | When working with image inputs (e.g., ATARI), sharing the convolutional neural network (CNN) backbone between E and E+I helps save memory, which is important when using GPUs (in our case, an NVIDIA RTX 3090Ti). |
| Software Dependencies | No | The paper mentions using PPO [13] and Pycolab [19], but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Pseudo-code can be found in Algorithm 2, and full implementation details including hyperparameters can be found in Appendix A.2. |