The tree autoencoder model, with application to hierarchical data visualization
Authors: Miguel A. Carreira-Perpinan, Kuat Gazizov
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In experiments, we show PCA trees are able to identify a wealth of low-dimensional and cluster structure in image and document datasets. and 6 Experiments In summary, we show the following: we confirm our theoretical predictions about monotonic decrease of the objective function and training time; compare the reconstruction error with PCA; and demonstrate how PCA trees are highly interpretable and extract significant structure from complex datasets. We use several datasets of different types (images, documents), some of which appear in the appendix. |
| Researcher Affiliation | Academia | Miguel Á. Carreira-Perpiñán Dept. of Computer Science and Engineering University of California, Merced mcarreira-perpinan@ucmerced.edu Kuat Gazizov Dept. of Computer Science and Engineering University of California, Merced kgazizov@ucmerced.edu |
| Pseudocode | Yes | Figure 6: Pseudocode for the PCA tree optimization algorithm. |
| Open Source Code | No | We plan to make available code in the future. |
| Open Datasets | Yes | MNIST and Fashion MNIST... 20newsgroups... Letter Recognition task for classifying 26 capital letters in the English alphabet [25]... Amazon Reviews... We use scikit-learn s Count Vectorizer to extract unigrams from the raw texts. [25] is M. Lichman. UCI machine learning repository. http://archive.ics.uci.edu/ml, 2013. |
| Dataset Splits | No | The paper states it uses the training sets of certain datasets (e.g., 'We use only the training set of size (N = 60000)' for MNIST/Fashion MNIST), but does not explicitly specify the training/validation/test splits, percentages, or cite predefined splits used for their experiments. |
| Hardware Specification | Yes | All experiments are conudcted on a Intel(R) Xeon(R) CPU E5-2699 v3 @ 2.30GHz, 256GB RAM. |
| Software Dependencies | No | The paper mentions software like C++ and Python, and libraries such as LIBLINEAR and scikit-learn, but does not provide specific version numbers for these software dependencies (e.g., 'LIBLINEAR [10]' without a version number, 'scikit-learn library' without a version). |
| Experiment Setup | Yes | We use the following hyperparameters: λ = 10 for MNIST and Fashion MNIST, λ = 1 for Letter, and λ = 0.01 for 20newsgroups and Amazon Reviews. The algorithm includes an early stopping criterion that terminates training if there is no decrease in the training error for 3 iterations with a change of less than 10 3. For t-SNE, we use the scikit-learn implementation (the default number of iterations is 1 000). For UMAP... the default number of iterations is 500 if the dataset size has fewer than 10 000 points and 200 otherwise). |