Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Introspective Experience Replay: Look Back When Surprised

Authors: Ramnath Kumar, Dheeraj Mysore Nagaraj

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through empirical evaluations, we demonstrate that IER with neural function approximation yields reliable and superior performance compared to UER, PER, and hindsight experience replay (HER) across most tasks. Our main ﬁndings are summarized below: Better Performance Against SOTA: Our proposed methodology (IER) outperforms previous state-of-the-art baselines such as PER, UER and HER on most environments (see Table 1, Section 5).
Researcher Affiliation	Industry	Ramnath Kumar EMAIL Google Research Dheeraj Nagaraj EMAIL Google Research
Pseudocode	Yes	Algorithm 1: Our proposed Introspective Experience Replay (IER) for Reinforcement Learning
Open Source Code	Yes	Our source code is made available for additional reference 6. 6https://github.com/google-research/look-back-when-surprised
Open Datasets	Yes	In this paper, we work with thirteen datasets, all of which are open-sourced in gym (https://github.com/openai/gym). More information about the environments is available in Appendix A.
Dataset Splits	No	The paper uses various OpenAI Gym environments and refers to
Hardware Specification	Yes	All runs have been run using the A100-SXM4-40GB, TITAN RTX, and V100 GPUs.
Software Dependencies	No	The paper mentions using DQN, DDPG, and TD3 algorithms, and OpenAI Gym (https://github.com/openai/gym), but does not provide specific version numbers for any software libraries, programming languages, or environments used.
Experiment Setup	Yes	Hyperparameters: Refer to Appendix B for the exact hyperparameters used. Across all our experiments on various environments, we use a standard setting for all the diﬀerent experience replay buﬀers. This classic setting is set so we can reproduce state-of-the-art performance using UER on the respective environment. For most of our experiments, we set the uniform mixing fraction (p) from Algorithm 1 to be 0. We use a non-zero p value only for a few environments to avoid becoming overtly selective while training, as described in Appendix B. For PER, we tune the α and β hyperparameters used in the Schaul et al. (2015) paper across all environments other than Atari. The default values of α = 0.4 and β = 0.6 are robust on Atari environments as shown by extensive hyperparameter search by Schaul et al. (2015). We detail the results from our grid search in Appendix C.2. Tables 5, 6, 7 and 8 list detailed hyperparameters for different algorithms and environments.