Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
Authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement the proposed RNAC in multiple Mu Jo Co environments (Hopper-v3, Walker2dv3, and Half Cheetah-v3), and demonstrate that RNAC with the proposed uncertainty sets results in robust behavior while canonical policy-based approaches suffer significant performance degradation. We also test RNAC on Turtle Bot [5], a real-world mobile robot, performing a navigation task. |
| Researcher Affiliation | Academia | Ruida Zhou Texas A&M University EMAIL Tao Liu Texas A&M University EMAIL Min Cheng Texas A&M University EMAIL Dileep Kalathil Texas A&M University EMAIL P. R. Kumar Texas A&M University EMAIL Chao Tian Texas A&M University EMAIL |
| Pseudocode | Yes | Algorithm 1: Robust Natural Actor-Critic; Algorithm 2: Robust Linear Temporal Difference (RLTD); Algorithm 3: Robust Q-Natural Policy Gradient (RQNPG) |
| Open Source Code | Yes | A video of the demonstration on Turtle Bot is available at [Video Link] and the RNAC code is provided in the supplementary material. |
| Open Datasets | No | The paper mentions training on "multiple Mu Jo Co environments (Hopper-v3, Walker2dv3, and Half Cheetah-v3)" and a "real-world Turtle Bot navigation task." These are simulation environments and a physical robot, not static publicly available datasets with explicit access information like a link, DOI, or formal citation for data files. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions training data as 'on-policy trajectory data' or 'batch of on-policy data' and parameters like 'maximum training steps' and 'batch size', but no specific percentages or counts for data partitioning. |
| Hardware Specification | Yes | All experimental results are carried out on a Linux server with 48-core RTX 6000 GPUs, 48-core Intel Xeon 6248R CPUs, and 384 GB DDR4 RAM. |
| Software Dependencies | No | The paper mentions using ADAM [24] as an optimizer and a neural Gaussian policy [48] but does not provide specific version numbers for these or other key software components (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We use the same hyperparameters across different Mu Jo Co environments. Specifically, we select γ = 0.99 for the discount factor, ηt = αt = 3 10 4 for learning rates of both actor and critic updates implemented by ADAM [24], T = 3 106 for the maximum training steps, and B = 2048 for the batch size. |