Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Risk-Averse Constrained Reinforcement Learning with Optimized Certainty Equivalents

Authors: Jane Lee, Baturay Saglam, Spyridon Pougkakiotis, Amin Karbasi, Dionysis Kalogerias

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate practical usefulness of our approach in extensive numerical experiments on standard benchmarks, showcasing its effectiveness in reducing risk constraint violations and improving stochastic stability through explicit risk management. We conduct experiments on locomotion tasks from the Safety-Gymnasium benchmark [19].
Researcher Affiliation Collaboration Jane H. Lee Department of Computer Science Yale University EMAIL; Baturay Saglam Department of Electrical Engineering Yale University EMAIL; Spyridon Pougkakiotis Department of Mathematics King s College London EMAIL; Amin Karbasi Foundation AI Cisco Systems Inc. EMAIL; Dionysis Kalogerias Department of Electrical Engineering Yale University EMAIL
Pseudocode Yes Algorithm 1 Reward-Based SGDA with Risk Constraints
Open Source Code Yes We provide the code at https://github.com/baturaysaglam/risk-averse-constrained-RL.
Open Datasets Yes We conduct experiments on locomotion tasks from the Safety-Gymnasium benchmark [19].
Dataset Splits Yes Agents are evaluated every 1000 time steps by averaging the undiscounted sum of rewards over 10 episodes. The PPO agent uses the mean action, ensuring consistency in evaluation. Evaluations are entirely separate from training no data is stored, and no network updates are performed. ... evaluated over 100 post-training episodes without further learning.
Hardware Specification Yes All experiments were performed on a computing system powered by an AMD Ryzen processor with 64 cores and 512 GB of RAM. A single NVIDIA RTX A6000 GPU with 48 GB VRAM was used for neural network training.
Software Dependencies No The paper mentions 'Proximal Policy Optimization (PPO) [39]' and 'Neural Networks' but does not specify version numbers for any software, libraries, or programming languages used.
Experiment Setup Yes Table 4: PPO hyperparameters used in the experiments. Table 5: Common hyperparameter values used across all environments.