Kernel-Based Reinforcement Learning: A Finite-Time Analysis
Authors: Omar Darwiche Domingues, Pierre Menard, Matteo Pirotta, Emilie Kaufmann, Michal Valko
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically validate our approach in continuous MDPs with sparse rewards. |
| Researcher Affiliation | Collaboration | 1Inria Lille 2Universit e de Lille 3Otto von Guericke University 4Facebook AI Research, Paris 5CNRS 6Deep Mind Paris. |
| Pseudocode | Yes | Algorithm 1 Kernel-UCBVI and Algorithm 2 optimistic Q. |
| Open Source Code | Yes | Implementations of Kernel-UCBVI are available on Git Hub, and use the rlberry library (Domingues et al., 2021). The reference provides the link: https: //github.com/rlberry-py/rlberry |
| Open Datasets | No | The paper describes a custom Grid-World environment (Section 7) but does not provide concrete access information (link, DOI, specific repository, or formal citation for a public dataset) for it. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits or percentages. It operates in an episodic reinforcement learning setting. |
| Hardware Specification | No | The paper does not specify any hardware details like GPU/CPU models, memory, or specific computing infrastructure used for the experiments. |
| Software Dependencies | No | The paper mentions "rlberry library" but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | We used the Euclidean distance and the Gaussian kernel with a fixed bandwidth σ = 0.025, matching the granularity of the uniform discretization used by some of the baselines. We ran the algorithms for 5 x 10^4 episodes |