Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

An Alternative to Variance: Gini Deviation for Risk-averse Policy Gradient

Authors: Yudong Luo, Guiliang Liu, Pascal Poupart, Yangchen Pan

NeurIPS 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirical evaluation in domains where risk-aversion can be clearly defined, shows that our algorithm can mitigate the limitations of variance-based risk measures and achieves high return with low risk in terms of variance and Gini deviation when others fail to learn a reasonable policy.
Researcher Affiliation	Academia	Yudong Luo1,4, Guiliang Liu2, Pascal Poupart1,4, Yangchen Pan3 1University of Waterloo, 2The Chinese University of Hong Kong, Shenzhen, 3University of Oxford, 4Vector Institute
Pseudocode	Yes	The full algorithms are summarized in Algorithm 1 and 2.
Open Source Code	Yes	Code is available at2. 2https://github.com/miyunluo/mean-gini
Open Datasets	Yes	This domain is taken from Open AI Gym Box2D environments [15]. ... Mujoco [16] is a collection of robotics environments with continuous states and actions in Open AI Gym [15].
Dataset Splits	No	The paper describes episode collection for training and evaluation for testing, but it does not specify explicit train/validation/test dataset splits with percentages or counts as typically found in supervised learning datasets.
Hardware Specification	No	The paper does not explicitly describe the specific hardware (e.g., CPU, GPU models, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions using Open AI Gym environments and Mujoco, but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Learning Parameters. We set discount factor γ = 0.999. MVO: policy learning rate is 1e-5 {5e-5, 1e-5, 5e-6}, value function learning rate is 100 times policy learning rate. λ = 1.0 {0.6, 0.8, 1.0, 1.2}. Sample size n = 50. Maximum inner update number M = 10. IS ratio range δ = 0.5. Inner termination ratio β = 0.6.