Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Authors: Ben Eysenbach, Russ R. Salakhutdinov, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare So RB to prior methods on two tasks: a simple 2D environment, and then a visual navigation task, where our method will plan over images. Ablation experiments will illustrate that accurate distances estimates are crucial to our algorithm s success.
Researcher Affiliation Collaboration Benjamin Eysenbachθφ, Ruslan Salakhutdinovθ, Sergey Levineφψ θCMU, φGoogle Brain, ψUC Berkeley beysenba@cs.cmu.edu
Pseudocode Yes Algorithm 1 Inputs are the current state s, the goal state sg, a buffer of observations B, the learned policy π and its value function V . Returns an action a. function SEARCHPOLICY(s, sg, B, V, π)
Open Source Code No The paper provides a link to a browser-based demo, 'http://bit.ly/rl_search', but not to the source code of the described methodology.
Open Datasets Yes We use 3D houses from the SUNCG dataset (Song et al., 2017), similar to the task described by Shah et al. (2018).
Dataset Splits No The paper describes training on 100 SUNCG houses and evaluating on 22 held-out houses, but does not provide specific train/validation/test splits (e.g., percentages or sample counts) for a single dataset.
Hardware Specification No The paper does not specify any particular GPU or CPU models, or other hardware specifications used for running experiments.
Software Dependencies No The paper mentions software like DQN, DDPG, and C51, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For VIN, we tuned the number of iterations as well as the number of hidden units in the recurrent layer. For SPTM, we performed a grid search over the threshold for adding edges, the threshold for choosing the next waypoint along the shortest path, and the parameters for sampling the training data.