Principle Component Trees and Their Persistent Homology
Authors: Ben Kizaric, Daniel Pimentel-Alarcón
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results show the effectiveness of both summaries on synthetic and real-world data, including multilingual word embeddings and the latent space of neural networks. Finally, we use PCTs to analyze neural network latent space, word embeddings, and reference image datasets. |
| Researcher Affiliation | Academia | Ben Kizaric1,3, Daniel Pimentel-Alarc on 2,3 1Department of Electrical Engineering, University of Wisconsin-Madison 2Department of Biostatistics, University of Wisconsin-Madison 3Wisconsin Institute For Discovery benkizaric@gmail.com, pimentelalar@wisc.edu |
| Pseudocode | Yes | Algorithm 1: CLUSTER-TEST Input: XD K Output: p: The p-value of the Cram er-von Mises test; the probability that X has angular uniformity. Sufficiently low p indicates subspace clustering. Algorithm 2: EXPAND-NODE Input: A node Ni and residuals Ri. Output: Either one or two child nodes of Ni. |
| Open Source Code | No | The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described. |
| Open Datasets | Yes | MNIST Digits vs Fashion. For our first experiment, we compare the widely known MNIST Digits and Fashion datasets (Deng 2012; Xiao, Rasul, and Vollgraf 2017). multilingual word embeddings. We use the embeddings presented in (Ferreira, Martins, and Almeida 2016), where each language s embedding were trained on the TED 2020 dataset (Cettolo, Girardi, and Federico 2012) |
| Dataset Splits | No | The paper mentions 'Bootstrap sample each dataset into a number of equally sized sub-datasets' in Section 7, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or absolute counts for reproducibility of model training. |
| Hardware Specification | No | The paper does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication. |
| Experiment Setup | Yes | The PCT construction algorithm uses a few hyperparameters. Firstly, we greedily build the tree up to a maximum number of nodes |T |max. We also specify a significance level αtest 0.05 which specifies how confident the test must be that there is subspace clustered / angle non-uniformity structure in order to split a node in two. |