Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Dual Control for Approximate Bayesian Reinforcement Learning

Authors: Edgar D. Klenske, Philipp Hennig

JMLR 2016 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on simulated systems show that this framework offers a useful approximation to the intractable aspects of Bayesian RL, producing structured exploration strategies that differ from standard RL approaches. We provide simple examples for the use of this framework in (approximate) Gaussian process regression and feedforward neural networks for the control of exploration.
Researcher Affiliation Academia Edgar D. Klenske EMAIL Max-Planck-Institute for Intelligent Systems Spemannstraße 38 72076 T ubingen, Germany Philipp Hennig EMAIL Max-Planck-Institute for Intelligent Systems Spemannstraße 38 72076 T ubingen, Germany
Pseudocode Yes Figure 2: Flow-chart of the approximate dual control algorithm to show the overall structure. Adapted from Tse and Bar-Shalom (1973). The left cycle is the inner loop, performing the nonlinear optimization.
Open Source Code No The paper does not contain any explicit statements or links indicating the availability of open-source code for the described methodology.
Open Datasets No The experiments were conducted on simulated systems, with the dynamics and parameters explicitly defined within the paper (e.g., in Section 6.2 and 6.3), rather than utilizing external, publicly available datasets.
Dataset Splits No The paper describes experiments on simulated dynamical systems where the system dynamics and parameters are defined within the text. It does not utilize predefined datasets that would require explicit training, testing, or validation splits.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as CPU or GPU models, or memory specifications.
Software Dependencies No The paper does not provide specific details about software dependencies or their version numbers used in the implementation or experimentation.
Experiment Setup Yes The paper provides specific experimental setup details, including system parameters and cost function weightings for each simulated experiment (e.g., 'a = 1 (known), b = 2, p(b) = N(b;1,10), Q = 10^-1, R = 0, W = 1, Λ = 1, T = 2' in Section 6.1). It also specifies parameters for the controllers, such as 'τ = 0.1' for BEB in Section 6.1, and the number and type of features for GP (30 features) and neural network models (4 logistic features).