Sparse Graphical Memory for Robust Planning

Authors: Scott Emmons, Ajay Jain, Misha Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimentally, we show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks. Project video and code are available at https://mishalaskin.github.io/sgm/. We evaluate SGM under two high-level learning frameworks: reinforcement learning (RL), and self-supervised learning (SSL).
Researcher Affiliation Academia Scott Emmons* Berkeley AI Research Ajay Jain* Berkeley AI Research Michael Laskin* Berkeley AI Research Thanard Kurutach Berkeley AI Research Pieter Abbeel Berkeley AI Research Deepak Pathak Carnegie Mellon University *Equal contribution. Author order determined randomly. {emmons, ajayj, mlaskin}@berkeley.edu 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada.
Pseudocode Yes Algorithm 1 Build Sparse Graph
Open Source Code Yes Project video and code are available at https://mishalaskin.github.io/sgm/.
Open Datasets Yes Point Env[9] continuous control of a point-mass in a maze used in So RB. Observations and goals are positional (x, y) coordinates. Vi ZDoom[49] discrete control of an agent in a visual maze environment used in SPTM. Observations and goals are images. Safety Gym[38] continuous control of an agent in a visual maze environment. Observations and goals are images, though odometry data is available for observations but not for goals.
Dataset Splits No The paper mentions environments used for testing but does not specify explicit training, validation, and test dataset splits in terms of percentages or counts required for reproduction.
Hardware Specification No The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions Python, PyTorch, and CUDA in Appendix C, but it does not specify version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes All networks are trained with the Adam optimizer [23, 24] with a learning rate of 1e-4 and batch size of 256. The replay buffer is size 1e6 transitions. We use uniform random actions for the first 1M steps for exploration, then switch to an epsilon-greedy strategy with a linear schedule from 1.0 to 0.1 over 1M steps.