Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mean-Semivariance Policy Optimization via Risk-Averse Reinforcement Learning
Authors: Xiaoteng Ma, Shuai Ma, Li Xia, Qianchuan Zhao
JAIR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in Mu Jo Co, which demonstrate the effectiveness of our proposed methods. |
| Researcher Affiliation | Academia | Xiaoteng Ma EMAIL Department of Automation, Tsinghua University, Beijing, 100086, P. R. China Shuai Ma EMAIL School of Business, Sun Yat-sen University, Guangzhou, 510275, P. R. China Li Xia EMAIL (Corresponding author) School of Business, Sun Yat-sen University, Guangzhou, 510275, P. R. China Qianchuan Zhao EMAIL Department of Automation, Tsinghua University, Beijing, 100086, P. R. China |
| Pseudocode | Yes | Algorithm 1 The framework of MSV optimization Algorithm 2 MSVAC Algorithm 3 MSVPO |
| Open Source Code | No | The paper does not contain any explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | Yes | Finally, we conduct diverse experiments from simple bandit problems to continuous control tasks in Mu Jo Co, which demonstrate the effectiveness of our proposed methods. |
| Dataset Splits | No | The paper describes environments used for experiments (e.g., MuJoCo's Walker2d) but does not provide explicit training/test/validation dataset splits. Instead, it describes an experimental protocol for these environments. |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., CPU, GPU models) used for running the experiments. |
| Software Dependencies | No | The paper mentions Mu Jo Co and Open AI gym, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Table 3: Hyper-parameters of MSVPO Network learning rate β: 3e-4 Network hidden sizes: [64, 64] Activation function: Tanh Optimizer: Adam Batch size: 256 Gradient Clipping: 10 Clipping parameter ε: 0.2 Optimization Epochs M: 10 GAE parameter λ: 0.95 Average Value Constraint Coefficient in APO (Ma et al., 2021) ν: 0.3 |