Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Conformal Mixed-Integer Constraint Learning with Feasibility Guarantees

Authors: Daniel Ovalle, Lorenz Biegler, Ignacio Grossmann, Carl Laird, Mateo Dulce Rubio

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on real-world applications demonstrate that C-MICL consistently achieves target feasibility rates, maintains competitive objective performance, and significantly reduces computational cost compared to existing methods. Our work bridges mathematical optimization and machine learning, offering a principled approach to incorporate uncertainty-aware constraints into decision-making with rigorous statistical guarantees.
Researcher Affiliation	Academia	Daniel Ovalle Department of Chemical Engineering Carnegie Mellon University EMAIL Lorenz T. Biegler Department of Chemical Engineering Carnegie Mellon University EMAIL Ignacio E. Grossmann Department of Chemical Engineering Carnegie Mellon University EMAIL Carl D. Laird Department of Chemical Engineering Carnegie Mellon University EMAIL Mateo Dulce Rubio Center for Data Science New York University EMAIL
Pseudocode	No	The paper describes methods and formulations in text and mathematical equations, and mentions details in appendices (e.g., Appendix A for embedding models into MIPs, Appendix C for conformal set MIP reformulations), but does not contain an explicitly labeled 'Pseudocode' or 'Algorithm' block in structured format.
Open Source Code	Yes	The code required to reproduce all numerical experiments presented in this section is publicly available in our Git Hub repository,4 which additionally includes detailed tutorials on conformal prediction and its integration into Pyomo-based optimization formulations. 4Git Hub: https://github.com/dovallev/c-micl
Open Datasets	Yes	We use the dataset of 5,000 food baskets from Maragno et al. [7], where palatability scores range continuously in [0, 1]. ... The dataset used in this analysis is publicly available at and published by Maragno et al. [7] here. ... The dataset for the basket case study is publicly available and properly referenced, while the dataset for the reactor case study is included in the supplementary material.
Dataset Splits	Yes	For our proposed C-MICL method, which only requires training two models: ˆh(x) and ˆu(x). We use the same model architectures and hyperparameters as the baselines, allocating 80% of the data for training, and the remaining 20% for conformal calibration.
Hardware Specification	Yes	All computations were performed on a Linux machine running Ubuntu, equipped with eight Intel , Xeon , Gold 6234 CPUs (3.30 GHz) and 1 TB of RAM, utilizing a total of eight hardware threads.
Software Dependencies	Yes	All optimization problems were solved using the MILP solver Gurobi v12.0.1 with a relative optimality gap of 1% [63]. Machine learning models were implemented using scikit-learn and Py Torch, and subsequently integrated into Pyomo-based [64] optimization formulations via the open-source library OMLT [65].
Experiment Setup	Yes	For the the Linear-Model Decision Tree, we set the maximum depth to five, the minimum number of samples required to split an internal node to ten, and the number of bins used for discretization to forty. For the Random Forest, we used fifteen estimators, a maximum depth of five, a minimum samples split of three, and considered sixty percent of the features when looking for the best split. The Gradient Boosting Tree model was configured with fifteen estimators, a learning rate of 0.2, a maximum depth of five, a minimum of five samples per split, and sixty percent of the features considered at each split. Lastly, the Re LU Neural Network was set up with two hidden layers of 32 units each, an L2 regularization strength of 0.01, and trained for 2000 epochs using Adam optimizer.