A Framework for Minimal Clustering Modification via Constraint Programming

Authors: Chia-Tung Kuo, S. Ravi, Thi-Bich-Hanh Dao, Christel Vrain, Ian Davidson1389

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically validate its usefulness through experiments on social network and medical imaging data sets.
Researcher Affiliation Academia Chia-Tung Kuo University of California, Davis tomkuo@ucdavis.edu S. S. Ravi University at Albany sravi@albany.edu Thi-Bich-Hanh Dao University of Orleans thi-bich-hanh.dao@univ-orleans.fr Christel Vrain University of Orleans Christel.Vrain@univ-orleans.fr Ian Davidson University of California, Davis davidson@cs.ucdavis.edu
Pseudocode Yes Figure 2: CP optimization encoding where the user provides a set of desired (feature-wise) diameters D as feedback.
Open Source Code Yes We provide enough details to reproduce our results and our code is made available2. 2https://sites.google.com/site/chiatungkuo/publication
Open Datasets Yes We apply our proposed approach to a network data set: Facebook-egonets from Stanford SNAP Data sets (Leskovec and Krevl 2014).
Dataset Splits No The paper mentions running k-means multiple times and selecting the best result, but does not provide specific train/validation/test split percentages, sample counts, or a detailed splitting methodology for reproducibility.
Hardware Specification No Consequently our experiments on the Facebook data (n = 4039, k = 4, f = 2) and f MRI data (n = 1730, k = 4, f = 2) each take less than 2 minutes to finish on a 12-core workstation.
Software Dependencies No Note that we chose to implement our model in the CP language Numberjack (Hebrard, O Mahony, and O Sullivan 2010) due to its simple interface and its use of state-of-the-art integer linear program (ILP) solvers. ILP solvers such as Gurobi (Inc. 2015) (used in our experiments) can easily exploit multi-core architectures.
Experiment Setup Yes We choose the upper and lower bounds according to the averages in the initial summary and set bounds [0.36, 0.4] for gender and [0.13, 0.15] for language so that these two features are balanced across clusters.