Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Introspective Experience Replay: Look Back When Surprised
Authors: Ramnath Kumar, Dheeraj Mysore Nagaraj
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through empirical evaluations, we demonstrate that IER with neural function approximation yields reliable and superior performance compared to UER, PER, and hindsight experience replay (HER) across most tasks. Our main findings are summarized below: Better Performance Against SOTA: Our proposed methodology (IER) outperforms previous state-of-the-art baselines such as PER, UER and HER on most environments (see Table 1, Section 5). |
| Researcher Affiliation | Industry | Ramnath Kumar EMAIL Google Research Dheeraj Nagaraj EMAIL Google Research |
| Pseudocode | Yes | Algorithm 1: Our proposed Introspective Experience Replay (IER) for Reinforcement Learning |
| Open Source Code | Yes | Our source code is made available for additional reference 6. 6https://github.com/google-research/look-back-when-surprised |
| Open Datasets | Yes | In this paper, we work with thirteen datasets, all of which are open-sourced in gym (https://github.com/openai/gym). More information about the environments is available in Appendix A. |
| Dataset Splits | No | The paper uses various OpenAI Gym environments and refers to |
| Hardware Specification | Yes | All runs have been run using the A100-SXM4-40GB, TITAN RTX, and V100 GPUs. |
| Software Dependencies | No | The paper mentions using DQN, DDPG, and TD3 algorithms, and OpenAI Gym (https://github.com/openai/gym), but does not provide specific version numbers for any software libraries, programming languages, or environments used. |
| Experiment Setup | Yes | Hyperparameters: Refer to Appendix B for the exact hyperparameters used. Across all our experiments on various environments, we use a standard setting for all the different experience replay buffers. This classic setting is set so we can reproduce state-of-the-art performance using UER on the respective environment. For most of our experiments, we set the uniform mixing fraction (p) from Algorithm 1 to be 0. We use a non-zero p value only for a few environments to avoid becoming overtly selective while training, as described in Appendix B. For PER, we tune the α and β hyperparameters used in the Schaul et al. (2015) paper across all environments other than Atari. The default values of α = 0.4 and β = 0.6 are robust on Atari environments as shown by extensive hyperparameter search by Schaul et al. (2015). We detail the results from our grid search in Appendix C.2. Tables 5, 6, 7 and 8 list detailed hyperparameters for different algorithms and environments. |