reproducibilityindex.ai

Bayes-Adaptive Simulation-based Search with Value Function Approximation

Authors: Arthur Guez, Nicolas Heess, David Silver, Peter Dayan

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we show that our approach requires considerably fewer simulations to ﬁnd good policies than BAMCP in a (discrete) bandit task and two continuous control tasks with a Gaussian process prior over the dynamics [5, 6].
Researcher Affiliation	Collaboration	Arthur Guez ,1,2 Nicolas Heess2 David Silver2 Peter Dayan1 aguez@google.com 1Gatsby Unit, UCL 2Google Deep Mind
Pseudocode	Yes	Algorithm 1: Bayes-Adaptive simulation-based search with root sampling
Open Source Code	No	The paper does not explicitly state that source code for the described methodology is being released or provide a link to it.
Open Datasets	No	The paper describes the setup for its experimental tasks (Bernoulli bandit, height map navigation, pendulum swing-up) but does not provide specific access information (link, DOI, repository, or formal citation with author/year for public availability) for any dataset used.
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments.
Software Dependencies	No	The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments.
Experiment Setup	Yes	We consider the scenario γ = 0.99, p0 = 0.2 for which the optimal decision, and the posterior mean decision frequently differ. We compare BAMCP against BAFA on this domain, planning over 75 steps with a discount of 0.98. We use conventional parameter settings for the pendulum [5], a mass of 1kg, a length of 1m, a maximum torque of 5Nm, and coefﬁcient of friction of 0.05 kg m2 / s. The state of the pendulum is s = (θ, θ). Each time-step corresponds to 0.05s, γ = 0.98, and the reward function is R(s) = cos(θ). The histogram is computed with 100 runs with (a) K = 10000, or (b) K = 15000, simulations for each algorithm, horizon T = 50 and (for BAFA) M = 50 particles.