Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning
Authors: Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate DSAC s effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks. In this section, we conduct experiments to answer the following questions: ... Three groups of experiments are designed to address these questions above. |
| Researcher Affiliation | Academia | XIAOTENG MA , Department of Automation, Tsinghua University, China JUNYAO CHEN , School of Engineering and Applied Science, Columbia University, United States LI XIA , School of Business, Sun Yat-sen University, China JUN YANG, Department of Automation, Tsinghua University, China QIANCHUAN ZHAO, Department of Automation, Tsinghua University, China ZHENGYUAN ZHOU, Stern School of Business, New York University, United States |
| Pseudocode | Yes | Algorithm 1 DSAC update |
| Open Source Code | Yes | The source code of our DSAC implementation1 is available online. 1https://github.com/xtma/dsac |
| Open Datasets | Yes | We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10]. |
| Dataset Splits | No | The paper describes how initial states are randomly selected or sampled for simulated environments ('Risky Mass Point', 'Risky Ant') but does not specify fixed training/validation/test splits for any pre-existing dataset. Reinforcement learning environments typically generate data dynamically rather than relying on pre-split datasets for training or evaluation. |
| Hardware Specification | Yes | All experiments are performed on a servers with 2 AMD EPYC 7702 64-Core Processor CPUs, 2x24-core Intel(R) Xeon(R) Platinum 8268 CPUs, and 8 Nvidia Ge Force RTX 2080 Ti GPUs. |
| Software Dependencies | No | We implement our algorithm based on rlpyt [57], a well-developed Py Torch [39] RL toolkit. (The paper mentions 'rlpyt' and 'PyTorch' but does not provide specific version numbers for these software components, which are necessary for reproducibility.) |
| Experiment Setup | Yes | Hyper-parameters and implementation details are listed in Appendix B.1. The source code of our DSAC implementation1 is available online. We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10]. (Specific hyperparameter values are detailed in Table 5 and Table 6 in Appendix B.2, covering parameters like learning rate, batch size, discount factor, and number of quantile fractions.) |