Natural Actor-Critic for Robust Reinforcement Learning with Function Approximation
Authors: Ruida Zhou, Tao Liu, Min Cheng, Dileep Kalathil, P. R. Kumar, Chao Tian
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement the proposed RNAC in multiple Mu Jo Co environments (Hopper-v3, Walker2dv3, and Half Cheetah-v3), and demonstrate that RNAC with the proposed uncertainty sets results in robust behavior while canonical policy-based approaches suffer significant performance degradation. We also test RNAC on Turtle Bot [5], a real-world mobile robot, performing a navigation task. |
| Researcher Affiliation | Academia | Ruida Zhou Texas A&M University ruida@tamu.edu Tao Liu Texas A&M University tliu@tamu.edu Min Cheng Texas A&M University minrara0404@tamu.edu Dileep Kalathil Texas A&M University dileep.kalathil@tamu.edu P. R. Kumar Texas A&M University prk@tamu.edu Chao Tian Texas A&M University chao.tian@tamu.edu |
| Pseudocode | Yes | Algorithm 1: Robust Natural Actor-Critic; Algorithm 2: Robust Linear Temporal Difference (RLTD); Algorithm 3: Robust Q-Natural Policy Gradient (RQNPG) |
| Open Source Code | Yes | A video of the demonstration on Turtle Bot is available at [Video Link] and the RNAC code is provided in the supplementary material. |
| Open Datasets | No | The paper mentions training on "multiple Mu Jo Co environments (Hopper-v3, Walker2dv3, and Half Cheetah-v3)" and a "real-world Turtle Bot navigation task." These are simulation environments and a physical robot, not static publicly available datasets with explicit access information like a link, DOI, or formal citation for data files. |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions training data as 'on-policy trajectory data' or 'batch of on-policy data' and parameters like 'maximum training steps' and 'batch size', but no specific percentages or counts for data partitioning. |
| Hardware Specification | Yes | All experimental results are carried out on a Linux server with 48-core RTX 6000 GPUs, 48-core Intel Xeon 6248R CPUs, and 384 GB DDR4 RAM. |
| Software Dependencies | No | The paper mentions using ADAM [24] as an optimizer and a neural Gaussian policy [48] but does not provide specific version numbers for these or other key software components (e.g., Python, PyTorch, TensorFlow). |
| Experiment Setup | Yes | We use the same hyperparameters across different Mu Jo Co environments. Specifically, we select γ = 0.99 for the discount factor, ηt = αt = 3 10 4 for learning rates of both actor and critic updates implemented by ADAM [24], T = 3 106 for the maximum training steps, and B = 2048 for the batch size. |