Unidimensional Clustering of Discrete Data Using Latent Tree Models

Authors: April Liu, Leonard Poon, Nevin Zhang

AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive empirical studies have been conducted to compare the new method with LCM and several other methods (K-means, kernel Kmeans and spectral clustering) that are not model-based.
Researcher Affiliation Academia 1 Department of Computer Science and Engineering The Hong Kong University of Science and Technology, Hong Kong {aprillh, lzhang}@cse.ust.hk 2 Department of Mathematics and Information Technology The Hong Kong Institute of Education, Hong Kong kmpoon@ied.edu.hk
Pseudocode Yes Algorithm 1 shows the pseudo-code for our algorithm.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository for the described methodology.
Open Datasets Yes The real-world data sets were from the UCI machine learning repository.
Dataset Splits No The paper describes a process for learning LCMs where cardinality is gradually increased and parameters re-estimated until the model score ceases to increase (guided by AIC/BIC). While this acts as a form of model selection/validation, it does not specify explicit train/validation dataset splits with percentages or counts.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions methods like EM algorithm and algorithms from other papers but does not specify software names with version numbers for dependencies (e.g., Python, PyTorch, scikit-learn versions).
Experiment Setup Yes In our experiments, the threshold δ is set at 3 as suggested by Kass and Raftery (1995)... To do so, we initially set the cardinality of Y1 at 2 and optimized the probability parameters using the EM algorithm... Then the cardinality is gradually increased and the parameters are re-estimated after each increase. The process stops when model score ceases to increase.