Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Understanding Fixed Predictions via Confined Regions
Authors: Connor Lawless, Tsui-Wei Weng, Berk Ustun, Madeleine Udell
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct a comprehensive empirical study of confined regions across diverse applications. Our results highlight that existing pointwise verification methods fail to anticipate future individuals with fixed predictions, while our method both identifies them and provides an interpretable description. |
| Researcher Affiliation | Academia | 1Stanford University 2University of California, San Diego. Correspondence to: Connor Lawless <EMAIL>. |
| Pseudocode | No | The paper describes methods using mathematical formulations and textual descriptions but does not include any explicitly labeled pseudocode or algorithm blocks. |
| Open Source Code | Yes | We include code to reproduce our results at https://github.com/conlaw/ confined_regions/ and provide additional details and results in Appendix E. |
| Open Datasets | Yes | We evaluate our approach on three real-world datasets in consumer finance (heloc (FICO, 2018), givemecredit(Kaggle, 2011)) and content moderation (twitterbot (Gilani et al., 2016)). |
| Dataset Splits | Yes | We split the processed dataset into a training sample (50% used to train the model), and an audit sample (used to evaluate responsiveness in deployment). |
| Hardware Specification | Yes | We run all experiments on a personal computer with an Apple M1 Pro chip and 32 GB of RAM. |
| Software Dependencies | Yes | All MILP and MIQCP problems were solved using Gurobi 9.0 (Achterberg, 2019) with default settings. |
| Experiment Setup | Yes | We use the training dataset to fit a ℓ1-regularized logistic regression model and tune its parameters via cross-validation. |