Topological Experience Replay
Authors: Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks. Notably, the proposed method also outperforms baselines that consume more batches of training experience and operates from high-dimensional observational data such as images. |
| Researcher Affiliation | Academia | Improbable AI Lab, Massachusetts Institute of Technology1 Aalto University2 |
| Pseudocode | Yes | Algorithm 1 Topological Experience Replay for Q-Learning |
| Open Source Code | Yes | Code is included in the zip file. |
| Open Datasets | Yes | We evaluate TER in Minigrid (Chevalier-Boisvert et al., 2018) and Sokoban (Schrader, 2018) and the references provide URLs: "Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018." and "Max-Philipp B. Schrader. gym-sokoban. https://github.com/mp Schrader/ gym-sokoban, 2018." |
| Dataset Splits | No | The paper does not provide specific details on training, validation, and test dataset splits, as experiments are conducted in simulation environments (Minigrid, Sokoban) where new episodes are generated rather than using fixed dataset splits. |
| Hardware Specification | No | We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources. However, specific details like GPU or CPU models were not mentioned. |
| Software Dependencies | No | The paper mentions using the 'pfrl codebase' and specific optimizers like 'Adam' and 'RMSProp', but does not provide version numbers for these software components. |
| Experiment Setup | Yes | Batch size, Optimizer, and Learning rate For all environments, we set batch size=64 for Minigrid, batch size=32 for Sokoban and Atari. The optimizers are Adam with learning rate=3e-4 for Minigrid and Sokoban. For Atari, we follow the configuration in (Mnih et al., 2015) and use RMSProp optimizer with learning rate=2.5e-4, alpha=0.95, eps=1e-2, and momentum=0.0. |