Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Cover Tree Bayesian Reinforcement Learning
Authors: Nikolaos Tziortziotis, Christos Dimitrakakis, Konstantinos Blekas
JMLR 2014 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conducted two sets of experiments to analyse the offline and the online performance. We compared CTBRL with the well-known LSPI algorithm (Lagoudakis and Parr, 2003) for the offline case, as well as an online variant (Bu soniu et al., 2010) for the online case. |
| Researcher Affiliation | Academia | Department of Computer Science and Engineering University of Ioannina GR-45110, Greece; Department of Computer Science and Engineering Chalmers University of Technology SE-41296, Sweden |
| Pseudocode | Yes | Algorithm 1 CTBRL (Episodic, using Thompson sampling) |
| Open Source Code | No | The paper states: "The exact implementation is available in the Cover Tree class in Dimitrakakis et al. (2007)." This refers to a component of their method from a prior work, not the full source code for the CTBRL methodology described in this 2014 paper. There is no explicit statement or link provided for the full methodology's code. |
| Open Datasets | Yes | We consider two well-known continuous state, discrete-action, episodic domains. The first is the inverted pendulum domain and the second is the mountain car domain. |
| Dataset Splits | No | The paper describes how data was generated (e.g., "rollouts from k = {10, 20, . . . , 50, 100, . . . , 1000} states drawn from the true environment s starting distribution") and used for evaluation ("evaluated over 1000 rollouts"). However, it does not specify traditional fixed training/test/validation splits for a pre-existing dataset with explicit percentages or sample counts, but rather details of a simulation-based, online learning setup. |
| Hardware Specification | No | The paper does not provide any specific hardware details such as GPU models, CPU models, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and concepts (e.g., LSPI, LSTD, Gaussian processes) but does not provide specific software names with version numbers for reproducibility (e.g., Python 3.8, PyTorch 1.9, CPLEX 12.4). |
| Experiment Setup | Yes | The discount factor γ was 0.95. The basis we used for LSTD/LSPI, was equidistant 3 x 3 grid of RBFs over the state space... The discount factor is set to γ = 0.999. An equidistant 4 x 4 grid of RBFs over the state space plus a constant term is selected for LSTD and LSPI. We used 25 API iterations on this data. We drew 1-step transitions from a set of 3000 uniformly drawn states from the sampled model. For online-LSPI, we followed the approach of Bu soniu et al. (2010), who adopts an ϵ-greedy exploration scheme with an exponentially decaying schedule ϵt = ϵt d, with ϵ0 = 1. In preliminary experiments, we found ϵd = 0.997 to be a reasonable compromise. |