reproducibilityindex.ai

Principle Component Trees and Their Persistent Homology

Authors: Ben Kizaric, Daniel Pimentel-Alarcón

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results show the effectiveness of both summaries on synthetic and real-world data, including multilingual word embeddings and the latent space of neural networks. Finally, we use PCTs to analyze neural network latent space, word embeddings, and reference image datasets.
Researcher Affiliation	Academia	Ben Kizaric1,3, Daniel Pimentel-Alarc on 2,3 1Department of Electrical Engineering, University of Wisconsin-Madison 2Department of Biostatistics, University of Wisconsin-Madison 3Wisconsin Institute For Discovery benkizaric@gmail.com, pimentelalar@wisc.edu
Pseudocode	Yes	Algorithm 1: CLUSTER-TEST Input: XD K Output: p: The p-value of the Cram er-von Mises test; the probability that X has angular uniformity. Sufficiently low p indicates subspace clustering. Algorithm 2: EXPAND-NODE Input: A node Ni and residuals Ri. Output: Either one or two child nodes of Ni.
Open Source Code	No	The paper does not provide any statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	MNIST Digits vs Fashion. For our first experiment, we compare the widely known MNIST Digits and Fashion datasets (Deng 2012; Xiao, Rasul, and Vollgraf 2017). multilingual word embeddings. We use the embeddings presented in (Ferreira, Martins, and Almeida 2016), where each language s embedding were trained on the TED 2020 dataset (Cettolo, Girardi, and Federico 2012)
Dataset Splits	No	The paper mentions 'Bootstrap sample each dataset into a number of equally sized sub-datasets' in Section 7, but it does not specify explicit training, validation, or test dataset splits in terms of percentages or absolute counts for reproducibility of model training.
Hardware Specification	No	The paper does not provide any specific hardware specifications (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions) that would be needed for replication.
Experiment Setup	Yes	The PCT construction algorithm uses a few hyperparameters. Firstly, we greedily build the tree up to a maximum number of nodes \|T \|max. We also specify a significance level αtest 0.05 which specifies how confident the test must be that there is subspace clustered / angle non-uniformity structure in order to split a node in two.