Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Ordered Counterfactual Explanation by Mixed-Integer Linear Optimization
Authors: Kentaro Kanamori, Takuya Takagi, Ken Kobayashi, Yuichi Ike, Kento Uemura, Hiroki Arimura11564-11574
AAAI 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we conducted experiments on real datasets to investigate the effectiveness and behavior of our Ord CE. All the code was implemented in Python 3.7 with scikit-learn and IBM ILOG CPLEX v12.101. All the experiments were conducted on 64-bit mac OS Catalina 10.15.6 with Intel Core i9 2.4GHz CPU and 64GB memory, and we imposed a 300 second time limit for solving. |
| Researcher Affiliation | Collaboration | 1Hokkaido University, 2Fujitsu Laboratories Ltd., 3Tokyo Institute of Technology |
| Pseudocode | No | The paper describes an "Algorithm" in numbered steps within a paragraph, but does not present it as a formal pseudocode block or algorithm environment. |
| Open Source Code | Yes | 1All the code is available at https://github.com/kelicht/ordce. |
| Open Datasets | Yes | We used four real datasets: FICO (D = 23) (FICO et al. 2018), German (D = 40), Wine Quality (D = 12), and Diabetes (D = 8) (Dua and Graff 2017) datasets |
| Dataset Splits | No | We randomly split each dataset into train (75%) and test (25%) instances, and trained ℓ2-regularized logistic regression classifiers (LR), random forest classifiers (RF) with T = 100 decision trees, and two-layer Re LU network classifiers (MLP) with T = 200 neurons, on each training dataset. The paper explicitly mentions train and test splits but does not specify a separate validation split. |
| Hardware Specification | Yes | All the experiments were conducted on 64-bit mac OS Catalina 10.15.6 with Intel Core i9 2.4GHz CPU and 64GB memory, and we imposed a 300 second time limit for solving. |
| Software Dependencies | Yes | All the code was implemented in Python 3.7 with scikit-learn and IBM ILOG CPLEX v12.101. |
| Experiment Setup | No | The paper specifies parameters for its proposed method (e.g., γ = 1.0, K = 4) and model architectures (e.g., T=100 decision trees for RF, T=200 neurons for MLP), but it does not provide common training-specific hyperparameters such as learning rate, batch size, optimizer type, or number of training epochs for the classifiers. |