Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Redeeming intrinsic rewards via constrained optimization
Authors: Eric Chen, Zhang-Wei Hong, Joni Pajarinen, Pulkit Agrawal
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Consistent performance gains across sixty-one ATARI games validate our claim. |
| Researcher Affiliation | Collaboration | Eric Chen , Zhang-Wei Hong * , Joni Pajarinen & Pulkit Agrawal Improbable AI Lab, Massachusetts Institute of Technology MIT-IBM Watson AI Lab Aalto University NSF AI Institute for AI and Fundamental Interactions (IAIFI) |
| Pseudocode | Yes | Algorithm 1 Extrinsic-Intrinsic Policy Optimization (EIPO) |
| Open Source Code | Yes | The code is available at https://github.com/Improbable-AI/eipo. |
| Open Datasets | Yes | We conducted experiments on ATARI games [20], the de-facto benchmark for exploration methods [4, 11]. |
| Dataset Splits | No | The paper uses standard ATARI benchmarks but does not explicitly detail the train/validation/test splits (e.g., percentages or specific counts) for reproducibility. |
| Hardware Specification | Yes | When working with image inputs (e.g., ATARI), sharing the convolutional neural network (CNN) backbone between E and E+I helps save memory, which is important when using GPUs (in our case, an NVIDIA RTX 3090Ti). |
| Software Dependencies | No | The paper mentions using PPO [13] and Pycolab [19], but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Pseudo-code can be found in Algorithm 2, and full implementation details including hyperparameters can be found in Appendix A.2. |