reproducibilityindex.ai

The tree autoencoder model, with application to hierarchical data visualization

Authors: Miguel A. Carreira-Perpinan, Kuat Gazizov

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In experiments, we show PCA trees are able to identify a wealth of low-dimensional and cluster structure in image and document datasets. and 6 Experiments In summary, we show the following: we conﬁrm our theoretical predictions about monotonic decrease of the objective function and training time; compare the reconstruction error with PCA; and demonstrate how PCA trees are highly interpretable and extract signiﬁcant structure from complex datasets. We use several datasets of different types (images, documents), some of which appear in the appendix.
Researcher Affiliation	Academia	Miguel Á. Carreira-Perpiñán Dept. of Computer Science and Engineering University of California, Merced mcarreira-perpinan@ucmerced.edu Kuat Gazizov Dept. of Computer Science and Engineering University of California, Merced kgazizov@ucmerced.edu
Pseudocode	Yes	Figure 6: Pseudocode for the PCA tree optimization algorithm.
Open Source Code	No	We plan to make available code in the future.
Open Datasets	Yes	MNIST and Fashion MNIST... 20newsgroups... Letter Recognition task for classifying 26 capital letters in the English alphabet [25]... Amazon Reviews... We use scikit-learn s Count Vectorizer to extract unigrams from the raw texts. [25] is M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013.
Dataset Splits	No	The paper states it uses the training sets of certain datasets (e.g., 'We use only the training set of size (N = 60000)' for MNIST/Fashion MNIST), but does not explicitly specify the training/validation/test splits, percentages, or cite predefined splits used for their experiments.
Hardware Specification	Yes	All experiments are conudcted on a Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 256GB RAM.
Software Dependencies	No	The paper mentions software like C++ and Python, and libraries such as LIBLINEAR and scikit-learn, but does not provide specific version numbers for these software dependencies (e.g., 'LIBLINEAR [10]' without a version number, 'scikit-learn library' without a version).
Experiment Setup	Yes	We use the following hyperparameters: λ = 10 for MNIST and Fashion MNIST, λ = 1 for Letter, and λ = 0.01 for 20newsgroups and Amazon Reviews. The algorithm includes an early stopping criterion that terminates training if there is no decrease in the training error for 3 iterations with a change of less than 10 3. For t-SNE, we use the scikit-learn implementation (the default number of iterations is 1 000). For UMAP... the default number of iterations is 500 if the dataset size has fewer than 10 000 points and 200 otherwise).