Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets
Authors: Bohdan Turbal, Iryna Voitsitska, Lesia Semenova
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In our evaluation pipeline, we work with the hypothesis space of linear models and multi-layer perceptrons (MLPs). However, our results can be extended to other hypothesis spaces that can be optimized with gradient descent, such as neural additive models [1]. In this section, we empirically show that Elli CE is faster and more robust as compared to other methods that produce robust counterfactuals. Please see Appendix B for additional details and results. |
| Researcher Affiliation | Academia | Bohdan Turbal1 Iryna Voitsitska2 Lesia Semenova3 1 Princeton University 2 Ukrainian Catholic University 3 Rutgers University EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper does not contain any explicit pseudocode or algorithm blocks. The methodology is described through mathematical formulations and textual descriptions in Section 4 and its subsections. |
| Open Source Code | Yes | Code Availability Implementations of Elli CE are available at https://github.com/Bogdan Turbal/Elli CE_ EXPERIMENTS. |
| Open Datasets | Yes | We consider nine datasets from high-stakes decision domains such as lending (Australian Credit [53], FICO [20], German Credit [27], Banknote [44]), healthcare (Parkinson s [60], Diabetes [58]), and recidivism (COMPAS [2]), as well as benchmark datasets (Wine Quality [13], Extended Iris [3]). |
| Dataset Splits | Yes | For every dataset, we performed 4-fold stratified cross-validation. Within each fold, the training data are further split into 80% for training and 20% for validation. |
| Hardware Specification | No | The paper mentions training linear models and MLPs using specific optimizers and regularization parameters, but does not provide specific details about the CPU or GPU models, memory, or other hardware specifications used for the experiments. |
| Software Dependencies | No | The paper mentions using 'Scikit-learn s LBFGS solver' for linear models and 'Adam optimizer' for MLPs, but does not provide specific version numbers for these libraries or other software dependencies. |
| Experiment Setup | Yes | For evaluators, we define a target multiplicity tolerance globally in range "target 2 [0, 0.1]. ... For every dataset, we performed 4-fold stratified cross-validation. Within each fold, the training data are further split into 80% for training and 20% for validation. ... Linear models are trained using Scikit-learn s LBFGS solver with an 2 penalty (regularization parameter 0.001). MLPs are trained with the Adam optimizer (learning rate 0.001), early stopping, and 2 regularization parameter 0.001. |