Affinity Learning for Mixed Data Clustering

Authors: Nan Li, Longin Jan Latecki

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.
Researcher Affiliation Academia Nan Li and Longin Jan Latecki Department of Computer and Information Sciences Temple University, Philadelphia, USA {nan.li,latecki}@temple.edu
Pseudocode No The paper includes mathematical equations and descriptions of the steps of the proposed methods, but it does not present any formal pseudocode blocks or algorithms labeled as such.
Open Source Code No The paper does not provide any explicit statements about making its source code open or a link to a code repository.
Open Datasets Yes We evaluate the performance of the proposed clustering framework on several real world data sets from the UCI Machine Learning Repository, including 5 mixed-type (Acute Inflammations, Heart Disease, Credit Approval, Contraceptive Method Choice and Adult) and 2 categorical (Soybean and Tic-Tac-Toe Endgame). The detailed information of these data sets is summarized in Table 1.
Dataset Splits No The paper mentions using a "training set" for the Adult dataset and running "all the tests are run for 100 times" but does not provide specific details on training/validation/test splits (e.g., percentages or explicit counts for a validation set) or methods like cross-validation for model tuning, which are typical for reproducibility in terms of data partitioning.
Hardware Specification Yes All the experiments are implemented in MATLAB R2016a and conducted on a PC with Intel(R) Core(TM) i7 processor up to 3.4 GHz and 16GB RAM.
Software Dependencies No The paper mentions that experiments are "implemented in MATLAB R2016a". While MATLAB R2016a is a specific software version, it is a programming environment rather than a specific library, solver, or multiple key software components with version numbers. The paper does not list other specific software dependencies or libraries with their version numbers that would be crucial for exact replication.
Experiment Setup Yes For the three derived clustering algorithms, we vary the scaling factor in equation 9 in the range of (max(amax, ρ(A)), 4 max(amax, ρ(A))] (see equation (8)) with a step size of 10. The best results achieved by each algorithm in this process are reported. For FRA+CL and FRA+KM, we use the squared Euclidean distance measure. ... the maximum number of iterations is set to be 1000 for the Adult data set, which contains much more data, and 100 for the other 6 data sets. Moreover, all the tests are run for 100 times and the average results are reported. For all the clustering algorithms above, we set the target number of clusters to be the number of classes in each data set.