Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Descriptive Clustering: ILP and CP Formulations with Applications

Authors: Thi-Bich-Hanh Dao, Chia-Tung Kuo, S. S. Ravi, Christel Vrain, Ian Davidson

IJCAI 2018 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Preliminary results demonstrate the utility of our approach on real data sets for images and electronic health care records and that it outperforms single objective and multiview clustering baselines. [...] We demonstrate the usefulness of our approach on real data sets including images and health care records.
Researcher Affiliation	Academia	Thi-Bich-Hanh Dao1, Chia-Tung Kuo2, S. S. Ravi34, Christel Vrain1, Ian Davidson2 1 LIFO, University of Orl eans, France 2 University of California, Davis 3 Virginia Tech 4 University at Albany SUNY
Pseudocode	Yes	Algorithm 1 presents a general iterative scheme to ﬁnd a complete and minimal set of Pareto optimal solutions using our earlier deﬁned constraints C as sub-problems.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository.
Open Datasets	Yes	The data set contains 30000 images from 50 classes of animals and 85 distinct (binary) tags describing the animals such as black, stripes, water, etc. Each class is associated with a (non-empty) subset of the 85 tags. We randomly sample 100 images from each of the ﬁrst 10 animal classes: antelope, grizzly bear, killer whale, beaver, dalmatian, persian cat, horse, german shepherd, blue whale, siamese cat. We cluster the data using pairwise Euclidean distance between images based on the 2000 dimensional SIFT features used in [Lampert et al., 2009] and describe it using the 85 tags.
Dataset Splits	No	The paper does not specify exact percentages or absolute sample counts for training, validation, and test splits, nor does it reference predefined splits with citations for these purposes. It mentions sampling data for experiments but not partitioning into distinct training/validation/test sets.
Hardware Specification	No	The paper mentions that the CP model runs on "a laptop" and the ILP model on "a 48 core cluster" but does not provide specific details such as CPU/GPU models, memory, or other hardware specifications.
Software Dependencies	No	The paper mentions "ILP models are implemented in Gurobi using its MATLAB interface" and "CP models are implemented using Gecode solver" but does not provide specific version numbers for Gurobi, MATLAB, or Gecode.
Experiment Setup	Yes	We run our bi-objective formulation with k = 5 and present its Pareto front in Figure 3(a). [...] We apply our bi-objective formulation (with k = 5) where the ﬁrst objective minimizes the diameter and the second objective looks for minimum tag disagreement (MTD) within a cluster.