Principle Component Trees and Their Persistent Homology

Authors: Ben Kizaric, Daniel Pimentel-Alarcón

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show the effectiveness of both summaries on synthetic and real-world data, including multilingual word embeddings and the latent space of neural networks. Finally, we use PCTs to analyze neural network latent space, word embeddings, and reference image datasets.
Researcher Affiliation Academia Ben Kizaric1,3, Daniel Pimentel-Alarc on 2,3 1Department of Electrical Engineering, University of Wisconsin-Madison 2Department of Biostatistics, University of Wisconsin-Madison 3Wisconsin Institute For Discovery benkizaric@gmail.com, pimentelalar@wisc.edu
Pseudocode Yes Algorithm 1: CLUSTER-TEST Input: XD K Output: p: The p-value of the Cram er-von Mises test; the probability that X has angular uniformity. Sufficiently low p indicates subspace clustering. Algorithm 2: EXPAND-NODE Input: A node Ni and residuals Ri. Output: Either one or two child nodes of Ni.
Open Source Code No The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets Yes MNIST Digits vs Fashion. For our first experiment, we compare the widely known MNIST Digits and Fashion datasets (Deng 2012; Xiao, Rasul, and Vollgraf 2017). multilingual word embeddings. We use the embeddings presented in (Ferreira, Martins, and Almeida 2016), where each language s embedding were trained on the TED 2020 dataset (Cettolo, Girardi, and Federico 2012)
Dataset Splits No The paper mentions 'Bootstrap sample each dataset into a number of equally sized sub-datasets' in Section 7, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or absolute counts for reproducibility of model training.
Hardware Specification No The paper does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication.
Experiment Setup Yes The PCT construction algorithm uses a few hyperparameters. Firstly, we greedily build the tree up to a maximum number of nodes |T |max. We also specify a significance level αtest 0.05 which specifies how confident the test must be that there is subspace clustered / angle non-uniformity structure in order to split a node in two.