Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning
Authors: Ruida Zhou, Tao Liu, Dileep Kalathil, P. R. Kumar, Chao Tian
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, the ARNPG-guided algorithms also demonstrate superior performance compared to some existing policy gradient-based approaches in both exact gradients and sample-based scenarios.In addition to the theoretical advantages, we provide the results of extensive experimentation in Section 5 and Appendices A and B which demonstrate that the ARNPG-guided algorithms provide superior performance in exact gradient and sample-based tabular scenarios, as well as actor-critic deep RL scenarios, compared to several existing policy gradient-based approaches. |
| Researcher Affiliation | Academia | Ruida Zhou Texas A&M University Tao Liu Texas A&M University Dileep Kalathil Texas A&M University P. R. Kumar Texas A&M University Chao Tian Texas A&M University |
| Pseudocode | Yes | Algorithm 1: Inner Loop( rk, k, , , tk) Algorithm 2: ARNPG Implicit Mirror Descent (ARNPG-IMD) Algorithm 3: ARNPG with Extra Primal Dual (ARNPG-EPD) Algorithm 4: ARNPG with Optimistic Mirror Descent Ascent (ARNPG-OMDA) |
| Open Source Code | Yes | We provide code at https://github.com/tliu1997/ARNPG-MORL. |
| Open Datasets | Yes | To demonstrate the efficacy of ARNPG-EPD on complex tasks, we have conducted experiments on the Acrobot-v1 environment from Open AI Gym [9]. |
| Dataset Splits | No | The paper does not explicitly specify train/validation/test splits, percentages, or sample counts. While it refers to 'randomly generated CMDP' and 'Acrobot-v1 environment', and mentions following settings from another paper, it does not detail the data splitting methodology within its own text. |
| Hardware Specification | Yes | For the sample-based tabular CMDP and Acrobot-v1 experiments, we used a single NVIDIA GeForce RTX 3090 GPU for each run. |
| Software Dependencies | No | The paper mentions environments like 'Open AI Gym' and implies the use of a deep learning framework given 'actor-critic deep RL scenarios', but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch, Gym). |
| Experiment Setup | Yes | Experimental details on CMDP are postponed to Appendix A and further experiments on smooth concave scalarization and max-min trade-off are presented in Appendix B. For all algorithms, we choose the learning rate α = 0.001. For ARNPG-EPD, we set the inner loop iterations tk = 5. For both problems, we set the discount factor γ = 0.99. |