Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Introspective Experience Replay: Look Back When Surprised

Authors: Ramnath Kumar, Dheeraj Mysore Nagaraj

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through empirical evaluations, we demonstrate that IER with neural function approximation yields reliable and superior performance compared to UER, PER, and hindsight experience replay (HER) across most tasks. Our main findings are summarized below: Better Performance Against SOTA: Our proposed methodology (IER) outperforms previous state-of-the-art baselines such as PER, UER and HER on most environments (see Table 1, Section 5).
Researcher Affiliation Industry Ramnath Kumar EMAIL Google Research Dheeraj Nagaraj EMAIL Google Research
Pseudocode Yes Algorithm 1: Our proposed Introspective Experience Replay (IER) for Reinforcement Learning
Open Source Code Yes Our source code is made available for additional reference 6. 6https://github.com/google-research/look-back-when-surprised
Open Datasets Yes In this paper, we work with thirteen datasets, all of which are open-sourced in gym (https://github.com/openai/gym). More information about the environments is available in Appendix A.
Dataset Splits No The paper uses various OpenAI Gym environments and refers to
Hardware Specification Yes All runs have been run using the A100-SXM4-40GB, TITAN RTX, and V100 GPUs.
Software Dependencies No The paper mentions using DQN, DDPG, and TD3 algorithms, and OpenAI Gym (https://github.com/openai/gym), but does not provide specific version numbers for any software libraries, programming languages, or environments used.
Experiment Setup Yes Hyperparameters: Refer to Appendix B for the exact hyperparameters used. Across all our experiments on various environments, we use a standard setting for all the different experience replay buffers. This classic setting is set so we can reproduce state-of-the-art performance using UER on the respective environment. For most of our experiments, we set the uniform mixing fraction (p) from Algorithm 1 to be 0. We use a non-zero p value only for a few environments to avoid becoming overtly selective while training, as described in Appendix B. For PER, we tune the α and β hyperparameters used in the Schaul et al. (2015) paper across all environments other than Atari. The default values of α = 0.4 and β = 0.6 are robust on Atari environments as shown by extensive hyperparameter search by Schaul et al. (2015). We detail the results from our grid search in Appendix C.2. Tables 5, 6, 7 and 8 list detailed hyperparameters for different algorithms and environments.