Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Affinity Learning for Mixed Data Clustering

Authors: Nan Li, Longin Jan Latecki

IJCAI 2017 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results on many real world data sets demonstrate that the proposed framework is effective for mixed data clustering.
Researcher Affiliation	Academia	Nan Li and Longin Jan Latecki Department of Computer and Information Sciences Temple University, Philadelphia, USA EMAIL
Pseudocode	No	The paper includes mathematical equations and descriptions of the steps of the proposed methods, but it does not present any formal pseudocode blocks or algorithms labeled as such.
Open Source Code	No	The paper does not provide any explicit statements about making its source code open or a link to a code repository.
Open Datasets	Yes	We evaluate the performance of the proposed clustering framework on several real world data sets from the UCI Machine Learning Repository, including 5 mixed-type (Acute Inﬂammations, Heart Disease, Credit Approval, Contraceptive Method Choice and Adult) and 2 categorical (Soybean and Tic-Tac-Toe Endgame). The detailed information of these data sets is summarized in Table 1.
Dataset Splits	No	The paper mentions using a "training set" for the Adult dataset and running "all the tests are run for 100 times" but does not provide specific details on training/validation/test splits (e.g., percentages or explicit counts for a validation set) or methods like cross-validation for model tuning, which are typical for reproducibility in terms of data partitioning.
Hardware Specification	Yes	All the experiments are implemented in MATLAB R2016a and conducted on a PC with Intel(R) Core(TM) i7 processor up to 3.4 GHz and 16GB RAM.
Software Dependencies	No	The paper mentions that experiments are "implemented in MATLAB R2016a". While MATLAB R2016a is a specific software version, it is a programming environment rather than a specific library, solver, or multiple key software components with version numbers. The paper does not list other specific software dependencies or libraries with their version numbers that would be crucial for exact replication.
Experiment Setup	Yes	For the three derived clustering algorithms, we vary the scaling factor in equation 9 in the range of (max(amax, ρ(A)), 4 max(amax, ρ(A))] (see equation (8)) with a step size of 10. The best results achieved by each algorithm in this process are reported. For FRA+CL and FRA+KM, we use the squared Euclidean distance measure. ... the maximum number of iterations is set to be 1000 for the Adult data set, which contains much more data, and 100 for the other 6 data sets. Moreover, all the tests are run for 100 times and the average results are reported. For all the clustering algorithms above, we set the target number of clusters to be the number of classes in each data set.