Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Multi-Objective SPIBB: Seldonian Offline Policy Improvement with Safety Constraints in Finite MDPs

Authors: harsh satija, Philip S. Thomas, Joelle Pineau, Romain Laroche

NeurIPS 2021 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively test our approach on a synthetic safety-gridworld task in Section 4 and show that the proposed algorithm achieves better data efficiency than the existing approaches. Finally, we show its benefits on a critical-care task in Section 5.
Researcher Affiliation Collaboration Harsh Satija Mc Gill University, Mila EMAIL Philip S. Thomas University of Massachusetts EMAIL Joelle Pineau Mc Gill University, Mila, Facebook AI Research EMAIL Romain Laroche Microsoft Research EMAIL
Pseudocode No The paper describes the proposed algorithms using mathematical formulations and textual descriptions, but it does not include any explicitly labeled pseudocode blocks or algorithm listings.
Open Source Code Yes The accompanying codebase is available at https://github.com/hercky/mo-spibb-codebase.
Open Datasets Yes We use the publicly available ICU dataset MIMIC-III (Johnson et al., 2016), with the setup described by Komorowski et al. (2018); Tang et al. (2020) and build on top of their data pre-processing and MDP construction methodology.4 This leaves us with a cohort of 20,954 unique patients.
Dataset Splits Yes We run our methods for 10 runs with different random seeds, where for each run the cohort dataset was split into train/valid/test sets in the ratios of 0.7/0.1/0.2.
Hardware Specification Yes The full pipeline including data processing and training took roughly 2 days on a single GPU (NVIDIA 1080 Ti).
Software Dependencies No The paper mentions using "standard solvers, such as cvxpy" but does not specify version numbers for cvxpy or any other software dependencies, making it difficult to reproduce the exact software environment.
Experiment Setup Yes We test on different combinations of user preference (λ) and baseline s quality (ρ) on 100 randomly generated CMDPs, where λi {0, 1}, ρ {0.1, 0.4, 0.7, 0.9} and |D| {10, 50, 500, 2000}. We evaluate under two settings: (i) we use a fixed set of parameters across different (λ, ρ) combinations, where we run S-OPT with ϵ {0.01, 0.1, 1.0} and H-OPT with Doubly Robust IS estimator (Jiang and Li, 2015) and Student s t-test concentration inequality