Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Efficient Policy Optimization in Robust Constrained MDPs with Iteration Complexity Guarantees
Authors: Sourav Ganguly, Kishan Panaganti, Arnob Ghosh, Adam Wierman
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our algorithms2 on two environments: (i) Garnet, and (ii) Constrained Riverswim (CRS). Additional experimental results are provided in Appendix E. ... Our experimental results demonstrate that RNPG consistently outperforms EPIRCPGS, in both environments. In fact, for the CRS environment (Figure 2) RNPG is the only one that produces a feasible policy. |
| Researcher Affiliation | Academia | Sourav Ganguly Department of ECE New Jersey Institute of Technology New Jersey, USA EMAIL Kishan Panaganti Department of CMS California Institute of Technology (now at Tencent AI Lab, Seattle, WA) EMAIL Arnob Ghosh Department of ECE New Jersey Institute of Technology New Jersey, USA EMAIL Adam Wierman Department of CMS California Institute of Technology California, USA EMAIL |
| Pseudocode | Yes | The complete procedure is described in Algorithm 1. First, we evaluate Jπt ci and πt Jπt ci using the robust policy evaluator which we describe in the following. Algorithm 1 Robust-Natural Policy Gradient for constrained MDP (RNPG) Algorithm 2 KL Uncertainty Evaluator Algorithm 3 Robust Q-table Algorithm 4 Robust-Projected Policy Gradient for CMDP with uncertainties (R-PPG) |
| Open Source Code | Yes | 2The complete implementation is available at https://github.com/Sourav1429/RCAC_NPG.git |
| Open Datasets | Yes | We evaluate our algorithms2 on two environments: (i) Garnet, and (ii) Constrained Riverswim (CRS). ... The vanilla Frozen-lake problem can be found in gymnasium library [59]. |
| Dataset Splits | No | The Frozen Lake environment is modeled as a d d grid world, where the agent begins its journey at the top-left corner, s0 = (0, 0), and aims to reach the bottom-right goal state sd2 1 = (d 1, d 1). ... The a grid is obstacle or not is decided randomly at the beginning of an episode. |
| Hardware Specification | Yes | 1The system specifications are, Processor: Intel(R)Core(TM)i7-14700-2.10 GHz, Installed RAM 32.0 GB (31.7 GB usable),64-bit operating system, x64-based processor No GPU. |
| Software Dependencies | No | We finally project the resulting value into the policy space simplex, Π. To perform projection, we find π (πt αt πt Ji(πt)) 2 π Π. However, this process is cumbersome, hence we can leverage the cvxpy package from Python to optimally solve the update equation. ... The vanilla Frozen-lake problem can be found in gymnasium library [59]. |
| Experiment Setup | Yes | Common hyperparameters The initial state distribution, denoted by ρ, is generated by sampling from a standard normal distribution followed by applying a softmax transformation to convert the resulting values into a valid probability distribution over states. ... The learning rate α is set to 10 3 for all algorithms across all environments. Another important hyperparameter is the loop control variable τ, used in Algorithm 3. ... The hyperparameters used for this environment are listed in Table 5 |