reproducibilityindex.ai

Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery

Authors: Stefano Ermon, Ronan Le Bras, Santosh Suram, John Gregoire, Carla Gomes, Bart Selman, Robert van Dover

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains. Experiments Encoding domain knowledge as additional constraints
Researcher Affiliation	Academia	Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu Ronan Le Bras Department of Computer Science Cornell University lebras@cs.cornell.edu Santosh K. Suram, John M. Gregoire Joint Center for Artiﬁcial Photosynthesis California Institute of Technology {sksuram, gregoire}@caltech.edu Carla P. Gomes, Bart Selman Department of Computer Science Cornell University {gomes,selman}@cornell.edu Robert B. van Dover Department of Materials Science and Engineering Cornell University rbv2@cornell.edu
Pseudocode	Yes	Algorithm 1 AMIQO
Open Source Code	No	The paper does not provide a direct link to open-source code or explicitly state that the code is publicly available.
Open Datasets	Yes	We consider a semi-supervised clustering task where we assume to have some prior information on the labels (equivalently, on the cluster assignment) of a subset of datapoints. Speciﬁcally, we assume to have information about pairs of data points, which should either belong to the same cluster (Must-Link) or not (Cannot-Link). This information is obtained using standard labeled datasets from the UCI repository (Bache and Lichman 2013) for which a ground truth clustering is known. In this experiment, we consider the Zoo dataset from UCI (Bache and Lichman 2013). We consider synthetic data from (Le Bras et al. 2014), generated from the Aluminium(Al)-Lithium(Li)-Iron(Fe) oxide system and for which the ground truth is known.
Dataset Splits	No	No explicit train/validation/test dataset splits (e.g., percentages or absolute counts) are provided. The paper mentions averaging results over 100 runs but does not specify how the data was partitioned for training, validation, or testing in a reproducible manner.
Hardware Specification	No	The paper refers to 'CPU time' and generic computing 'infrastructure supported by the NSF' and 'SLAC National Accelerator Laboratory' but does not provide specific details on hardware components such as GPU/CPU models, memory, or processor types used for experiments.
Software Dependencies	No	The paper mentions using 'state-of-the-art mixed-integer quadratic programming (MIQP) solvers such as IBM CPLEX' but does not specify version numbers for CPLEX or any other software dependencies, which are necessary for reproducibility.
Experiment Setup	No	The paper describes the constraints and the algorithm (AMIQO) in detail, but it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, optimization parameters) or detailed system-level training configurations needed to reproduce the experimental setup.