Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

ElliCE: Efficient and Provably Robust Algorithmic Recourse via the Rashomon Sets

Authors: Bohdan Turbal, Iryna Voitsitska, Lesia Semenova

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In our evaluation pipeline, we work with the hypothesis space of linear models and multi-layer perceptrons (MLPs). However, our results can be extended to other hypothesis spaces that can be optimized with gradient descent, such as neural additive models [1]. In this section, we empirically show that Elli CE is faster and more robust as compared to other methods that produce robust counterfactuals. Please see Appendix B for additional details and results.
Researcher Affiliation	Academia	Bohdan Turbal1 Iryna Voitsitska2 Lesia Semenova3 1 Princeton University 2 Ukrainian Catholic University 3 Rutgers University EMAIL, EMAIL, EMAIL
Pseudocode	No	The paper does not contain any explicit pseudocode or algorithm blocks. The methodology is described through mathematical formulations and textual descriptions in Section 4 and its subsections.
Open Source Code	Yes	Code Availability Implementations of Elli CE are available at https://github.com/Bogdan Turbal/Elli CE_ EXPERIMENTS.
Open Datasets	Yes	We consider nine datasets from high-stakes decision domains such as lending (Australian Credit [53], FICO [20], German Credit [27], Banknote [44]), healthcare (Parkinson s [60], Diabetes [58]), and recidivism (COMPAS [2]), as well as benchmark datasets (Wine Quality [13], Extended Iris [3]).
Dataset Splits	Yes	For every dataset, we performed 4-fold stratiﬁed cross-validation. Within each fold, the training data are further split into 80% for training and 20% for validation.
Hardware Specification	No	The paper mentions training linear models and MLPs using specific optimizers and regularization parameters, but does not provide specific details about the CPU or GPU models, memory, or other hardware specifications used for the experiments.
Software Dependencies	No	The paper mentions using 'Scikit-learn s LBFGS solver' for linear models and 'Adam optimizer' for MLPs, but does not provide specific version numbers for these libraries or other software dependencies.
Experiment Setup	Yes	For evaluators, we deﬁne a target multiplicity tolerance globally in range "target 2 [0, 0.1]. ... For every dataset, we performed 4-fold stratiﬁed cross-validation. Within each fold, the training data are further split into 80% for training and 20% for validation. ... Linear models are trained using Scikit-learn s LBFGS solver with an 2 penalty (regularization parameter 0.001). MLPs are trained with the Adam optimizer (learning rate 0.001), early stopping, and 2 regularization parameter 0.001.