Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Bayes-Adaptive Simulation-based Search with Value Function Approximation
Authors: Arthur Guez, Nicolas Heess, David Silver, Peter Dayan
NeurIPS 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our approach requires considerably fewer simulations to find good policies than BAMCP in a (discrete) bandit task and two continuous control tasks with a Gaussian process prior over the dynamics [5, 6]. |
| Researcher Affiliation | Collaboration | Arthur Guez ,1,2 Nicolas Heess2 David Silver2 Peter Dayan1 EMAIL 1Gatsby Unit, UCL 2Google Deep Mind |
| Pseudocode | Yes | Algorithm 1: Bayes-Adaptive simulation-based search with root sampling |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is being released or provide a link to it. |
| Open Datasets | No | The paper describes the setup for its experimental tasks (Bernoulli bandit, height map navigation, pendulum swing-up) but does not provide specific access information (link, DOI, repository, or formal citation with author/year for public availability) for any dataset used. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or detailed splitting methodology) for training, validation, and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper does not provide specific ancillary software details, such as library or solver names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | We consider the scenario γ = 0.99, p0 = 0.2 for which the optimal decision, and the posterior mean decision frequently differ. We compare BAMCP against BAFA on this domain, planning over 75 steps with a discount of 0.98. We use conventional parameter settings for the pendulum [5], a mass of 1kg, a length of 1m, a maximum torque of 5Nm, and coefficient of friction of 0.05 kg m2 / s. The state of the pendulum is s = (θ, θ). Each time-step corresponds to 0.05s, γ = 0.98, and the reward function is R(s) = cos(θ). The histogram is computed with 100 runs with (a) K = 10000, or (b) K = 15000, simulations for each algorithm, horizon T = 50 and (for BAFA) M = 50 particles. |