Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Near-Optimal Sample Complexity for MDPs via Anchoring

Authors: Jongmin Lee, Mario Bravo, Roberto Cominetti

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical We study a new model-free algorithm to compute ε-optimal policies for average reward Markov decision processes... Our procedure combines a recursive sampling technique with Halpern s anchored iteration, and computes an ε-optimal policy with sample and time complexity... To our knowledge, this is the best complexity among model-free algorithms, matching the known lower bound up to a factor h sp. This paper focuses on the theoretical aspects of reinforcement learning. There are no societal impacts that we anticipate from our theoretical result.
Researcher Affiliation Academia 1Department of Mathematical Sciences, Seoul National University, Seoul, Korea 2Facultad de Ingeniería y Ciencias, Universidad Adolfo Ibáñez, Santiago, Chile 3Institute for Mathematical & Computational Engineering and Department of Industrial and Systems Engineering, Pontificia Universidad Católica de Chile, Santiago, Chile. Correspondence to: Jongmin Lee <EMAIL>, Mario Bravo <EMAIL>, Roberto Cominetti <EMAIL>.
Pseudocode Yes Our algorithms, to be presented in next section, will use as a subroutine the following generic sampling procedure. Algorithm 1 SAMPLE(d, m) ... The following SAVIA iteration is our basic algorithm... Algorithm 2 SAVIA(Q0, n, ε, δ) ... Specifically, we consider the SAVIA+ iteration described in Algorithm 3. Algorithm 3 SAVIA+(Q0, ε, δ)
Open Source Code No No explicit statement about code availability or links to repositories for the methodology described in this paper were found.
Open Datasets No The paper introduces a new model-free algorithm and analyzes its theoretical properties, particularly sample and time complexity. It describes a 'generative model that provides independent samples' as part of the theoretical framework, but does not use or refer to any specific publicly available or open datasets for empirical evaluation.
Dataset Splits No The paper is theoretical and does not conduct experiments with specific datasets. Therefore, it does not provide any information regarding dataset splits (training, testing, validation).
Hardware Specification No The paper is theoretical, focusing on algorithm design and complexity analysis. It does not mention running any experiments and therefore does not specify hardware used for computations.
Software Dependencies No The paper presents theoretical algorithms and their complexity analysis, without describing any practical implementation or experiments. Consequently, no specific software dependencies or version numbers are mentioned.
Experiment Setup No The paper is theoretical, focusing on the design and complexity analysis of the SAVIA and SAVID algorithms. It does not describe any empirical experiments, and thus does not provide details on experimental setup such as hyperparameter values, training configurations, or system-level settings.