Descriptive Clustering: ILP and CP Formulations with Applications
Authors: Thi-Bich-Hanh Dao, Chia-Tung Kuo, S. S. Ravi, Christel Vrain, Ian Davidson
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Preliminary results demonstrate the utility of our approach on real data sets for images and electronic health care records and that it outperforms single objective and multiview clustering baselines. [...] We demonstrate the usefulness of our approach on real data sets including images and health care records. |
| Researcher Affiliation | Academia | Thi-Bich-Hanh Dao1, Chia-Tung Kuo2, S. S. Ravi34, Christel Vrain1, Ian Davidson2 1 LIFO, University of Orl eans, France 2 University of California, Davis 3 Virginia Tech 4 University at Albany SUNY |
| Pseudocode | Yes | Algorithm 1 presents a general iterative scheme to find a complete and minimal set of Pareto optimal solutions using our earlier defined constraints C as sub-problems. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology or a link to a code repository. |
| Open Datasets | Yes | The data set contains 30000 images from 50 classes of animals and 85 distinct (binary) tags describing the animals such as black, stripes, water, etc. Each class is associated with a (non-empty) subset of the 85 tags. We randomly sample 100 images from each of the first 10 animal classes: antelope, grizzly bear, killer whale, beaver, dalmatian, persian cat, horse, german shepherd, blue whale, siamese cat. We cluster the data using pairwise Euclidean distance between images based on the 2000 dimensional SIFT features used in [Lampert et al., 2009] and describe it using the 85 tags. |
| Dataset Splits | No | The paper does not specify exact percentages or absolute sample counts for training, validation, and test splits, nor does it reference predefined splits with citations for these purposes. It mentions sampling data for experiments but not partitioning into distinct training/validation/test sets. |
| Hardware Specification | No | The paper mentions that the CP model runs on "a laptop" and the ILP model on "a 48 core cluster" but does not provide specific details such as CPU/GPU models, memory, or other hardware specifications. |
| Software Dependencies | No | The paper mentions "ILP models are implemented in Gurobi using its MATLAB interface" and "CP models are implemented using Gecode solver" but does not provide specific version numbers for Gurobi, MATLAB, or Gecode. |
| Experiment Setup | Yes | We run our bi-objective formulation with k = 5 and present its Pareto front in Figure 3(a). [...] We apply our bi-objective formulation (with k = 5) where the first objective minimizes the diameter and the second objective looks for minimum tag disagreement (MTD) within a cluster. |