An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

Authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
Researcher Affiliation Academia Yudong Luo1,4, Guiliang Liu2, Pascal Poupart1,4, Yangchen Pan3 1University of Waterloo, 2The Chinese University of Hong Kong, Shenzhen, 3University of Oxford, 4Vector Institute
Pseudocode Yes The full algorithms are summarized in Algorithm 1 and 2.
Open Source Code Yes Code is available at2. 2https://github.com/miyunluo/mean-gini
Open Datasets Yes This domain is taken from Open AI Gym Box2D environments [15]. ... Mujoco [16] is a collection of robotics environments with continuous states and actions in Open AI Gym [15].
Dataset Splits No The paper describes episode collection for training and evaluation for testing, but it does not specify explicit train/validation/test dataset splits with percentages or counts as typically found in supervised learning datasets.
Hardware Specification No The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions using Open AI Gym environments and Mujoco, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Learning Parameters. We set discount factor γ = 0.999. MVO: policy learning rate is 1e-5 {5e-5, 1e-5, 5e-6}, value function learning rate is 100 times policy learning rate. λ = 1.0 {0.6, 0.8, 1.0, 1.2}. Sample size n = 50. Maximum inner update number M = 10. IS ratio range δ = 0.5. Inner termination ratio β = 0.6.