Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning
Authors: Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou
JAIR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate DSAC s effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks. In this section, we conduct experiments to answer the following questions: ... Three groups of experiments are designed to address these questions above. |
| Researcher Affiliation | Academia | XIAOTENG MA , Department of Automation, Tsinghua University, China JUNYAO CHEN , School of Engineering and Applied Science, Columbia University, United States LI XIA , School of Business, Sun Yat-sen University, China JUN YANG, Department of Automation, Tsinghua University, China QIANCHUAN ZHAO, Department of Automation, Tsinghua University, China ZHENGYUAN ZHOU, Stern School of Business, New York University, United States |
| Pseudocode | Yes | Algorithm 1 DSAC update |
| Open Source Code | Yes | The source code of our DSAC implementation1 is available online. 1https://github.com/xtma/dsac |
| Open Datasets | Yes | We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10]. |
| Dataset Splits | No | The paper describes how initial states are randomly selected or sampled for simulated environments ('Risky Mass Point', 'Risky Ant') but does not specify fixed training/validation/test splits for any pre-existing dataset. Reinforcement learning environments typically generate data dynamically rather than relying on pre-split datasets for training or evaluation. |
| Hardware Specification | Yes | All experiments are performed on a servers with 2 AMD EPYC 7702 64-Core Processor CPUs, 2x24-core Intel(R) Xeon(R) Platinum 8268 CPUs, and 8 Nvidia Ge Force RTX 2080 Ti GPUs. |
| Software Dependencies | No | We implement our algorithm based on rlpyt [57], a well-developed Py Torch [39] RL toolkit. (The paper mentions 'rlpyt' and 'PyTorch' but does not provide specific version numbers for these software components, which are necessary for reproducibility.) |
| Experiment Setup | Yes | Hyper-parameters and implementation details are listed in Appendix B.1. The source code of our DSAC implementation1 is available online. We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10]. (Specific hyperparameter values are detailed in Table 5 and Table 6 in Appendix B.2, covering parameters like learning rate, batch size, discount factor, and number of quantile fractions.) |