reproducibilityindex.ai

Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Authors: Ben Eysenbach, Russ R. Salakhutdinov, Sergey Levine

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare So RB to prior methods on two tasks: a simple 2D environment, and then a visual navigation task, where our method will plan over images. Ablation experiments will illustrate that accurate distances estimates are crucial to our algorithm s success.
Researcher Affiliation	Collaboration	Benjamin Eysenbachθφ, Ruslan Salakhutdinovθ, Sergey Levineφψ θCMU, φGoogle Brain, ψUC Berkeley beysenba@cs.cmu.edu
Pseudocode	Yes	Algorithm 1 Inputs are the current state s, the goal state sg, a buffer of observations B, the learned policy π and its value function V . Returns an action a. function SEARCHPOLICY(s, sg, B, V, π)
Open Source Code	No	The paper provides a link to a browser-based demo, 'http://bit.ly/rl_search', but not to the source code of the described methodology.
Open Datasets	Yes	We use 3D houses from the SUNCG dataset (Song et al., 2017), similar to the task described by Shah et al. (2018).
Dataset Splits	No	The paper describes training on 100 SUNCG houses and evaluating on 22 held-out houses, but does not provide specific train/validation/test splits (e.g., percentages or sample counts) for a single dataset.
Hardware Specification	No	The paper does not specify any particular GPU or CPU models, or other hardware specifications used for running experiments.
Software Dependencies	No	The paper mentions software like DQN, DDPG, and C51, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	For VIN, we tuned the number of iterations as well as the number of hidden units in the recurrent layer. For SPTM, we performed a grid search over the threshold for adding edges, the threshold for choosing the next waypoint along the shortest path, and the parameters for sampling the training data.