Sparse Graphical Memory for Robust Planning
Authors: Scott Emmons, Ajay Jain, Misha Laskin, Thanard Kurutach, Pieter Abbeel, Deepak Pathak
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimentally, we show that SGM significantly outperforms current state of the art methods on long horizon, sparse-reward visual navigation tasks. Project video and code are available at https://mishalaskin.github.io/sgm/. We evaluate SGM under two high-level learning frameworks: reinforcement learning (RL), and self-supervised learning (SSL). |
| Researcher Affiliation | Academia | Scott Emmons* Berkeley AI Research Ajay Jain* Berkeley AI Research Michael Laskin* Berkeley AI Research Thanard Kurutach Berkeley AI Research Pieter Abbeel Berkeley AI Research Deepak Pathak Carnegie Mellon University *Equal contribution. Author order determined randomly. {emmons, ajayj, mlaskin}@berkeley.edu 34th Conference on Neural Information Processing Systems (Neur IPS 2020), Vancouver, Canada. |
| Pseudocode | Yes | Algorithm 1 Build Sparse Graph |
| Open Source Code | Yes | Project video and code are available at https://mishalaskin.github.io/sgm/. |
| Open Datasets | Yes | Point Env[9] continuous control of a point-mass in a maze used in So RB. Observations and goals are positional (x, y) coordinates. Vi ZDoom[49] discrete control of an agent in a visual maze environment used in SPTM. Observations and goals are images. Safety Gym[38] continuous control of an agent in a visual maze environment. Observations and goals are images, though odometry data is available for observations but not for goals. |
| Dataset Splits | No | The paper mentions environments used for testing but does not specify explicit training, validation, and test dataset splits in terms of percentages or counts required for reproduction. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions Python, PyTorch, and CUDA in Appendix C, but it does not specify version numbers for these software components, which is required for reproducibility. |
| Experiment Setup | Yes | All networks are trained with the Adam optimizer [23, 24] with a learning rate of 1e-4 and batch size of 256. The replay buffer is size 1e6 transitions. We use uniform random actions for the first 1M steps for exploration, then switch to an epsilon-greedy strategy with a linear schedule from 1.0 to 0.1 over 1M steps. |