An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient
Authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy. |
| Researcher Affiliation | Academia | Yudong Luo1,4, Guiliang Liu2, Pascal Poupart1,4, Yangchen Pan3 1University of Waterloo, 2The Chinese University of Hong Kong, Shenzhen, 3University of Oxford, 4Vector Institute |
| Pseudocode | Yes | The full algorithms are summarized in Algorithm 1 and 2. |
| Open Source Code | Yes | Code is available at2. 2https://github.com/miyunluo/mean-gini |
| Open Datasets | Yes | This domain is taken from Open AI Gym Box2D environments [15]. ... Mujoco [16] is a collection of robotics environments with continuous states and actions in Open AI Gym [15]. |
| Dataset Splits | No | The paper describes episode collection for training and evaluation for testing, but it does not specify explicit train/validation/test dataset splits with percentages or counts as typically found in supervised learning datasets. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using Open AI Gym environments and Mujoco, but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Learning Parameters. We set discount factor γ = 0.999. MVO: policy learning rate is 1e-5 {5e-5, 1e-5, 5e-6}, value function learning rate is 100 times policy learning rate. λ = 1.0 {0.6, 0.8, 1.0, 1.2}. Sample size n = 50. Maximum inner update number M = 10. IS ratio range δ = 0.5. Inner termination ratio β = 0.6. |