Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Online Robust Reinforcement Learning Through Monte-Carlo Planning
Authors: Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we provide empirical evidence that our method achieves robust performance in planning problems even under significant ambiguity in the underlying reward distribution and transition dynamics. Our contributions are threefold: Robust Empirical Performance: We conduct experiments in two environments (Gambler s Problem and Frozen Lake) to evaluate our robust algorithm, demonstrating that it achieves superior robust performance to model mismatches than the standard MCTS algorithm baseline. |
| Researcher Affiliation | Academia | 1Hanoi University of Science and Technology, Hanoi, Vietnam 2Department of Computing and Mathematical Sciences, California Insitute of Technology Pasadena, CA, USA 3Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189-CRISt AL. |
| Pseudocode | Yes | Algorithm 1: Robust-Power-UCT with γ discount factor. |
| Open Source Code | Yes | We also provide our code at https://github.com/brahimdriss/Robust MCTS. |
| Open Datasets | Yes | The Gambler s Problem (Sutton and Barto, 2018): a classic casino-inspired reinforcement learning environment... Frozen Lake(Towers et al., 2024): This environment presents a gridworld navigation challenge... |
| Dataset Splits | No | The paper uses simulated environments (Gambler's Problem, Frozen Lake, American Option Pricing) rather than traditional datasets with explicit train/test/validation splits. It describes evaluation scenarios under different planning and execution probabilities and uses '100 seeds' for experiments, but does not specify dataset splits in the conventional sense. |
| Hardware Specification | No | This work was granted access to the HPC resources of IDRIS under the allocation 2024-AD011015599 made by GENCI. This mentions a High-Performance Computing resource but lacks specific hardware details such as CPU/GPU models or memory. |
| Software Dependencies | No | We implement our robust MCTS framework by extending a base Monte Carlo Tree Search implementation from (Leurent, 2018). While a software implementation is mentioned, no specific version number for 'rl-agents' or any other libraries is provided. |
| Experiment Setup | Yes | All experiments are done over 100 seeds, using γ = 0.99 and robustness budget ρ = 0.5, with these values showing consistent performance across preliminary experiments with different parameter settings. We use 2000 rollouts for The Gambler s Problem and 4000 rollouts for Frozen Lake. |