Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Online Robust Reinforcement Learning Through Monte-Carlo Planning

Authors: Tuan Quang Dam, Kishan Panaganti, Brahim Driss, Adam Wierman

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we provide empirical evidence that our method achieves robust performance in planning problems even under significant ambiguity in the underlying reward distribution and transition dynamics. Our contributions are threefold: Robust Empirical Performance: We conduct experiments in two environments (Gambler s Problem and Frozen Lake) to evaluate our robust algorithm, demonstrating that it achieves superior robust performance to model mismatches than the standard MCTS algorithm baseline.
Researcher Affiliation	Academia	1Hanoi University of Science and Technology, Hanoi, Vietnam 2Department of Computing and Mathematical Sciences, California Insitute of Technology Pasadena, CA, USA 3Univ. Lille, Inria, CNRS, Centrale Lille, UMR 9189-CRISt AL.
Pseudocode	Yes	Algorithm 1: Robust-Power-UCT with γ discount factor.
Open Source Code	Yes	We also provide our code at https://github.com/brahimdriss/Robust MCTS.
Open Datasets	Yes	The Gambler s Problem (Sutton and Barto, 2018): a classic casino-inspired reinforcement learning environment... Frozen Lake(Towers et al., 2024): This environment presents a gridworld navigation challenge...
Dataset Splits	No	The paper uses simulated environments (Gambler's Problem, Frozen Lake, American Option Pricing) rather than traditional datasets with explicit train/test/validation splits. It describes evaluation scenarios under different planning and execution probabilities and uses '100 seeds' for experiments, but does not specify dataset splits in the conventional sense.
Hardware Specification	No	This work was granted access to the HPC resources of IDRIS under the allocation 2024-AD011015599 made by GENCI. This mentions a High-Performance Computing resource but lacks specific hardware details such as CPU/GPU models or memory.
Software Dependencies	No	We implement our robust MCTS framework by extending a base Monte Carlo Tree Search implementation from (Leurent, 2018). While a software implementation is mentioned, no specific version number for 'rl-agents' or any other libraries is provided.
Experiment Setup	Yes	All experiments are done over 100 seeds, using γ = 0.99 and robustness budget ρ = 0.5, with these values showing consistent performance across preliminary experiments with different parameter settings. We use 2000 rollouts for The Gambler s Problem and 4000 rollouts for Frozen Lake.