Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Global Convergence of Policy Gradient in Average Reward MDPs

Authors: Navdeep Kumar, Yashaswini Murthy, Itai Shufaro, Kfir Y Levy, R. Srikant, Shie Mannor

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also present simulations that empirically validate the result. We also present simulations that empirically validate the result.
Researcher Affiliation	Collaboration	Navdeep Kumar Electrical and Computer Engineering Technion Israel Institute of Technology EMAIL Yashaswini Murthy ECE & CSL University of Illinois Urbana-Champaign EMAIL Shie Mannor Electrical Engineering Technion Israel Institute of Technology NVIDIA Research EMAIL
Pseudocode	No	The paper describes mathematical derivations and theoretical results. It does not include any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any explicit statement about open-sourcing the code, nor does it include a link to a code repository or mention code in supplementary materials.
Open Datasets	No	The paper describes how the MDPs are constructed and transition kernels are randomly generated for simulations (e.g., 'We construct the transition kernel and the reward function in the same manner for all MDPs...' and 'We randomly generate a transition kernel...'), rather than utilizing pre-existing, publicly available datasets with access information.
Dataset Splits	No	The paper's simulation section details the construction of Markov Decision Processes for empirical validation, including varying parameters like state and action space cardinalities and reward functions. It does not involve standard datasets with explicit training, test, or validation splits.
Hardware Specification	No	The paper does not provide any specific details regarding the hardware (e.g., GPU/CPU models, memory) used to conduct the simulations or experiments.
Software Dependencies	No	The paper does not list any specific software dependencies or their version numbers, such as programming languages, libraries, or frameworks, that were used for the implementation or experiments.
Experiment Setup	No	The paper describes aspects of the simulation environment, such as the number of iterations for projected policy gradient ('Projected policy gradient was implemented for 2000 iterations') and the construction of MDPs. However, it does not provide specific hyperparameters for the policy gradient algorithm, such as a concrete learning rate value for the simulations or other training configurations.