Topological Experience Replay

Authors: Zhang-Wei Hong, Tao Chen, Yen-Chen Lin, Joni Pajarinen, Pulkit Agrawal

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically show that our method is substantially more data-efficient than several baselines on a diverse range of goal-reaching tasks. Notably, the proposed method also outperforms baselines that consume more batches of training experience and operates from high-dimensional observational data such as images.
Researcher Affiliation Academia Improbable AI Lab, Massachusetts Institute of Technology1 Aalto University2
Pseudocode Yes Algorithm 1 Topological Experience Replay for Q-Learning
Open Source Code Yes Code is included in the zip file.
Open Datasets Yes We evaluate TER in Minigrid (Chevalier-Boisvert et al., 2018) and Sokoban (Schrader, 2018) and the references provide URLs: "Maxime Chevalier-Boisvert, Lucas Willems, and Suman Pal. Minimalistic gridworld environment for openai gym. https://github.com/maximecb/gym-minigrid, 2018." and "Max-Philipp B. Schrader. gym-sokoban. https://github.com/mp Schrader/ gym-sokoban, 2018."
Dataset Splits No The paper does not provide specific details on training, validation, and test dataset splits, as experiments are conducted in simulation environments (Minigrid, Sokoban) where new episodes are generated rather than using fixed dataset splits.
Hardware Specification No We are grateful to MIT Supercloud and the Lincoln Laboratory Supercomputing Center for providing HPC resources. However, specific details like GPU or CPU models were not mentioned.
Software Dependencies No The paper mentions using the 'pfrl codebase' and specific optimizers like 'Adam' and 'RMSProp', but does not provide version numbers for these software components.
Experiment Setup Yes Batch size, Optimizer, and Learning rate For all environments, we set batch size=64 for Minigrid, batch size=32 for Sokoban and Atari. The optimizers are Adam with learning rate=3e-4 for Minigrid and Sokoban. For Atari, we follow the configuration in (Mnih et al., 2015) and use RMSProp optimizer with learning rate=2.5e-4, alpha=0.95, eps=1e-2, and momentum=0.0.