Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Kernel-Based Reinforcement Learning in Robust Markov Decision Processes
Authors: Shiau Hong Lim, Arnaud Autef
ICML 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we demonstrate that the better performance bound does translate into solutions that perform better, especially when there is a model mismatch between the training and the testing environments. |
| Researcher Affiliation | Collaboration | 1IBM Research, Singapore 2Applied Mathematics department, Ecole polytechnique, France. Work accomplished while working at IBM Research, Singapore. |
| Pseudocode | Yes | Algorithm 1 Robust kernel-based value iteration... Algorithm 2 Robust kernel-based value iteration, II |
| Open Source Code | Yes | The complete source code for the implementation of our algorithm as well as the task environments are provided in the supplementary material. |
| Open Datasets | No | For Puddle World, ... We follow the strategy of (Barreto et al., 2016) in creating the training set Da by running a random policy on 10 training episodes... The representative states for φ are then created by running K-means on the training states. |
| Dataset Splits | No | The paper mentions 'best-performing training set and kernel parameters are chosen' but does not specify a distinct validation split (e.g., 80/10/10 or similar percentages/counts) for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific hardware details (like CPU/GPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'Gaussian kernel' and '4-th order Runge-Kutta method' for simulation but does not list specific software dependencies with version numbers (e.g., Python 3.x, TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | Our value iteration is stopped when wt+1 wt < 0.001 or after 100 iterations, whichever happens earlier. We use γ = 0.99 for all our tasks. ...For the bandwidth parameters, we employ a wide range during training, from the set {exp( 8), exp( 7) . . . exp(3)}. This results in 144 pairs of (σψ, σφ), and we always choose the best-performing pair based on 30 independent test episodes. |