Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning
Authors: Shangtong Zhang, Bo Liu, Shimon Whiteson10905-10913
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | All curves in this section are averaged over 10 independent runs with shaded regions indicate standard errors. All implementations are publicly available.1 We report the mean of those 20 episodic returns against the training steps in Figure 1. The curves are generated by setting λ = 1. More details are provided in the appendix. |
| Researcher Affiliation | Academia | 1 University of Oxford 2 Auburn University |
| Pseudocode | Yes | Algorithm 1: Mean-Variance Policy Iteration (MVPI) and Algorithm 2: Off-line MVPI |
| Open Source Code | Yes | All implementations are publicly available.1 1https://github.com/ShangtongZhang/Deep RL |
| Open Datasets | Yes | We benchmark MVPI-TD3 on eight Mujoco robot manipulation tasks from Open AI gym. |
| Dataset Splits | No | No specific train/validation/test dataset splits (percentages or sample counts) are mentioned in the paper for the Mujoco tasks, which are continuous environments. Evaluation is described as 'evaluate the algorithm every 10^4 steps for 20 episodes'. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory) used for running experiments are provided in the paper. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x) are explicitly mentioned in the paper. |
| Experiment Setup | Yes | We run each algorithm for 10^6 steps and evaluate the algorithm every 10^4 steps for 20 episodes. We use two-hidden-layer neural networks for function approximation. In the policy evaluation step of MVPI-TD3, we set yk+1 to the average of the recent K rewards, where K is a hyperparameter to be tuned... The curves are generated by setting λ = 1. More details are provided in the appendix. |