Pattern Decomposition with Complex Combinatorial Constraints: Application to Materials Discovery

Authors: Stefano Ermon, Ronan Le Bras, Santosh Suram, John Gregoire, Carla Gomes, Bart Selman, Robert van Dover

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our approach considerably outperforms the state of the art on the materials discovery problem, scaling to larger datasets and recovering more precise and physically meaningful decompositions. We also show the effectiveness of our approach for enforcing background knowledge on other application domains. Experiments Encoding domain knowledge as additional constraints
Researcher Affiliation Academia Stefano Ermon Computer Science Department Stanford University ermon@cs.stanford.edu Ronan Le Bras Department of Computer Science Cornell University lebras@cs.cornell.edu Santosh K. Suram, John M. Gregoire Joint Center for Artificial Photosynthesis California Institute of Technology {sksuram, gregoire}@caltech.edu Carla P. Gomes, Bart Selman Department of Computer Science Cornell University {gomes,selman}@cornell.edu Robert B. van Dover Department of Materials Science and Engineering Cornell University rbv2@cornell.edu
Pseudocode Yes Algorithm 1 AMIQO
Open Source Code No The paper does not provide a direct link to open-source code or explicitly state that the code is publicly available.
Open Datasets Yes We consider a semi-supervised clustering task where we assume to have some prior information on the labels (equivalently, on the cluster assignment) of a subset of datapoints. Specifically, we assume to have information about pairs of data points, which should either belong to the same cluster (Must-Link) or not (Cannot-Link). This information is obtained using standard labeled datasets from the UCI repository (Bache and Lichman 2013) for which a ground truth clustering is known. In this experiment, we consider the Zoo dataset from UCI (Bache and Lichman 2013). We consider synthetic data from (Le Bras et al. 2014), generated from the Aluminium(Al)-Lithium(Li)-Iron(Fe) oxide system and for which the ground truth is known.
Dataset Splits No No explicit train/validation/test dataset splits (e.g., percentages or absolute counts) are provided. The paper mentions averaging results over 100 runs but does not specify how the data was partitioned for training, validation, or testing in a reproducible manner.
Hardware Specification No The paper refers to 'CPU time' and generic computing 'infrastructure supported by the NSF' and 'SLAC National Accelerator Laboratory' but does not provide specific details on hardware components such as GPU/CPU models, memory, or processor types used for experiments.
Software Dependencies No The paper mentions using 'state-of-the-art mixed-integer quadratic programming (MIQP) solvers such as IBM CPLEX' but does not specify version numbers for CPLEX or any other software dependencies, which are necessary for reproducibility.
Experiment Setup No The paper describes the constraints and the algorithm (AMIQO) in detail, but it does not provide specific hyperparameter values (e.g., learning rates, batch sizes, optimization parameters) or detailed system-level training configurations needed to reproduce the experimental setup.