Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning

Authors: Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate DSAC s effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks. In this section, we conduct experiments to answer the following questions: ... Three groups of experiments are designed to address these questions above.
Researcher Affiliation Academia XIAOTENG MA , Department of Automation, Tsinghua University, China JUNYAO CHEN , School of Engineering and Applied Science, Columbia University, United States LI XIA , School of Business, Sun Yat-sen University, China JUN YANG, Department of Automation, Tsinghua University, China QIANCHUAN ZHAO, Department of Automation, Tsinghua University, China ZHENGYUAN ZHOU, Stern School of Business, New York University, United States
Pseudocode Yes Algorithm 1 DSAC update
Open Source Code Yes The source code of our DSAC implementation1 is available online. 1https://github.com/xtma/dsac
Open Datasets Yes We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10].
Dataset Splits No The paper describes how initial states are randomly selected or sampled for simulated environments ('Risky Mass Point', 'Risky Ant') but does not specify fixed training/validation/test splits for any pre-existing dataset. Reinforcement learning environments typically generate data dynamically rather than relying on pre-split datasets for training or evaluation.
Hardware Specification Yes All experiments are performed on a servers with 2 AMD EPYC 7702 64-Core Processor CPUs, 2x24-core Intel(R) Xeon(R) Platinum 8268 CPUs, and 8 Nvidia Ge Force RTX 2080 Ti GPUs.
Software Dependencies No We implement our algorithm based on rlpyt [57], a well-developed Py Torch [39] RL toolkit. (The paper mentions 'rlpyt' and 'PyTorch' but does not provide specific version numbers for these software components, which are necessary for reproducibility.)
Experiment Setup Yes Hyper-parameters and implementation details are listed in Appendix B.1. The source code of our DSAC implementation1 is available online. We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10]. (Specific hyperparameter values are detailed in Table 5 and Table 6 in Appendix B.2, covering parameters like learning rate, batch size, discount factor, and number of quantile fractions.)