Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
SafeAR: Safe Algorithmic Recourse by Risk-Aware Policies
Authors: Haochen Wu, Shubham Sharma, Sunandita Patra, Sriram Gopalakrishnan
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our method to two real-world datasets and compare policies with different risk-aversion levels using risk measures and recourse desiderata (sparsity and proximity). Evaluate the policies with different risk profiles computed by G-RSVI on two real-world datasets (UCI Adult Income, German Credit) |
| Researcher Affiliation | Collaboration | 1 University of Michigan, Ann Arbor 2 J.P. Morgan AI Research EMAIL, EMAIL, EMAIL, EMAIL |
| Pseudocode | Yes | Algorithm 1: G-RSVI Input: recourse MDP S, A, T, R, H , ML model f Parameters: risk aversion level β [0, ] |
| Open Source Code | Yes | All supplemental materials (appendices and code implementations) are available through arxiv.org/abs/2308.12367. |
| Open Datasets | Yes | Adult Income Dataset (AID) (32561 data points) (Becker and Kohavi 1996) and German Credit Dataset (GCD) (1000 data points) (Hofmann 1994) |
| Dataset Splits | No | The paper mentions converting continuous features and training classifiers, but does not explicitly detail train/validation/test splits with percentages, sample counts, or specific methodologies for reproducibility. It states 'All measures are averaged over the entire dataset'. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions training 'random forest classifiers' but does not specify any software names with version numbers (e.g., scikit-learn version, Python version, or other libraries). |
| Experiment Setup | Yes | The horizon is set to 12. We select β = 0.25, 0.50, 0.75 for generating risk-averse recourse policies, and higher β indicates higher risk-aversion. We use qualitative assumptions (domain knowledge) on relative differences in action costs and success likelihood to define the action costs r( ) and transition model p( ). The transition probabilities are heuristically set by domain knowledge. |