Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

DSAC: Distributional Soft Actor-Critic for Risk-Sensitive Reinforcement Learning

Authors: Xiaoteng Ma, Junyao Chen, Li Xia, Jun Yang, Qianchuan Zhao, Zhengyuan Zhou

JAIR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate DSAC s effectiveness in enhancing agent performances for both risk-neutral and risk-sensitive control tasks. In this section, we conduct experiments to answer the following questions: ... Three groups of experiments are designed to address these questions above.
Researcher Affiliation	Academia	XIAOTENG MA , Department of Automation, Tsinghua University, China JUNYAO CHEN , School of Engineering and Applied Science, Columbia University, United States LI XIA , School of Business, Sun Yat-sen University, China JUN YANG, Department of Automation, Tsinghua University, China QIANCHUAN ZHAO, Department of Automation, Tsinghua University, China ZHENGYUAN ZHOU, Stern School of Business, New York University, United States
Pseudocode	Yes	Algorithm 1 DSAC update
Open Source Code	Yes	The source code of our DSAC implementation1 is available online. 1https://github.com/xtma/dsac
Open Datasets	Yes	We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10].
Dataset Splits	No	The paper describes how initial states are randomly selected or sampled for simulated environments ('Risky Mass Point', 'Risky Ant') but does not specify fixed training/validation/test splits for any pre-existing dataset. Reinforcement learning environments typically generate data dynamically rather than relying on pre-split datasets for training or evaluation.
Hardware Specification	Yes	All experiments are performed on a servers with 2 AMD EPYC 7702 64-Core Processor CPUs, 2x24-core Intel(R) Xeon(R) Platinum 8268 CPUs, and 8 Nvidia Ge Force RTX 2080 Ti GPUs.
Software Dependencies	No	We implement our algorithm based on rlpyt [57], a well-developed Py Torch [39] RL toolkit. (The paper mentions 'rlpyt' and 'PyTorch' but does not provide specific version numbers for these software components, which are necessary for reproducibility.)
Experiment Setup	Yes	Hyper-parameters and implementation details are listed in Appendix B.1. The source code of our DSAC implementation1 is available online. We evaluate our algorithm with Mu Jo Co [62] and Box2d in Open AI Gym [10]. (Specific hyperparameter values are detailed in Table 5 and Table 6 in Appendix B.2, covering parameters like learning rate, batch size, discount factor, and number of quantile fractions.)