reproducibilityindex.ai

Last-Iterate Global Convergence of Policy Gradients for Constrained Reinforcement Learning

Authors: Alessandro Montenegro, Marco Mussi, Matteo Papini, Alberto Maria Metelli

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we numerically validate our algorithms on constrained control problems, and compare them with state-of-the-art baselines, demonstrating their effectiveness.
Researcher Affiliation	Academia	Alessandro Montenegro Politecnico di Milano, Milan, Italy alessandro.montenegro@polimi.it Marco Mussi Politecnico di Milano, Milan, Italy marco.mussi@polimi.it Matteo Papini Politecnico di Milano, Milan, Italy matteo.papini@polimi.it Alberto Maria Metelli Politecnico di Milano, Milan, Italy albertomaria.metelli@polimi.it
Pseudocode	Yes	Algorithms. Both algorithms, whose pseudo-codes are deferred to Appendix A, aim at solving the RCOP of Equation (11), finding the best feasible (hyper)policy parameterization.
Open Source Code	Yes	The code to run the experiments in this paper is available at https://github.com/Montenegro Alessandro/Magic RL.
Open Datasets	Yes	In our experiments, we consider a Cost LQR environment whose main characteristics are reported in Table 5. ... For our experiments on risk minimization, we utilized environments from the Mu Jo Co control suite (Todorov et al., 2012)...
Dataset Splits	No	The paper describes batch sizes (N) for collecting trajectories during learning, and for NPG-PD2 and RPG-PD2, it mentions 'N1 500 were used for the inner critic-loop, while N2 100 for performance and cost estimations.' However, it does not specify explicit training/validation/test dataset splits in the conventional sense, as is common in supervised learning.
Hardware Specification	Yes	All the experiments were run on a 2019 16-inches Mac Book Pro. The machine was equipped as follows: CPU Intel Core i7 (6 cores, 2.6 GHz) 16 GB 2667 MHz DDR4 GPU Intel UHD Graphics 630 1536 MB
Software Dependencies	No	The paper mentions the use of 'Adam (Kingma and Ba, 2015) scheduler' and 'Mu Jo Co control suite (Todorov et al., 2012)' but does not provide specific version numbers for these or other software dependencies like programming languages or machine learning frameworks.
Experiment Setup	Yes	In particular, for both C-PGAE and NPG-PD, we employed ζθ 0.01 and ζλ 0.1, while for RPG-PD we selected ζθ 0.01 and ζλ 0.01. For C-PGAE and RPG-PD we used a regularization constant ω 10 4. All the details about the experimental setting are summarized in Table 6.