Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Generalizable Spectral Embedding with an Application to UMAP
Authors: Nir Ben-Ari, Amitai Yacobi, Uri Shaham
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate Sep-Spectral Net s ability to consistently approximate and generalize SE, while maintaining Spectral Net s scalability. Additionally, we show how Sep-Spectral Net can be leveraged to enable generalizable UMAP visualization. 5 Experiments In this section, we demonstrate Sep-Spectral Net s ability to approximate and generalize the SE using four real-world datasets: CIFAR10 (via their CLIP embedding); Appliances Energy Prediction dataset (Candanedo, 2017); Kuzushiji-MNIST (KMNIST) dataset (Clanuwat et al., 2018); Parkinsons Telemonitoring dataset (Tsanas & Little, 2009). Fig. 5 presents our results on the real-world datasets. Sep-Spectral Net s output is used directly, while Spectral Net s predicted eigenvectors are resorted to minimize the mean sin2 distance. The results clearly show that Sep-Spectral Net consistently produces significantly more accurate SE approximations compared to Spectral Net, due to the improved separation of the eigenvectors. Tab. 2 presents the k NN results on the real-world datasets. |
| Researcher Affiliation | Academia | Nir Ben-Ari EMAIL Department of Computer Science Bar-Ilan University Amitai Yacobi EMAIL Department of Computer Science Bar-Ilan University Uri Shaham EMAIL Department of Computer Science Bar-Ilan University |
| Pseudocode | Yes | Algorithms Layout. Our end-to-end training approach is summarized in Algorithms 1 and 2 in App. B. Algorithm 1: Spectral Net training (Shaham et al., 2018) Algorithm 2: Eigenvectors separation |
| Open Source Code | Yes | 1Sep-Spectral Net: https://github.com/shaham-lab/Gr EASE; NUMAP: https://github.com/shaham-lab/NUMAP |
| Open Datasets | Yes | CIFAR10 (via their CLIP embedding); Appliances Energy Prediction dataset (Candanedo, 2017); Kuzushiji-MNIST (KMNIST) dataset (Clanuwat et al., 2018); Parkinsons Telemonitoring dataset (Tsanas & Little, 2009). We consider four real-world datasets: CIFAR10 (via their CLIP embedding); Appliances Energy Prediction dataset; Wine (Aeberhard & Forina, 1992); Banknote Authentication (Lohweg, 2012). Tab. 9 extends Tab. 4 with additional two datasets: MNIST (Deng, 2012) and Fashion MNIST (Xiao et al., 2017). |
| Dataset Splits | Yes | We used a train-test split of 80-20 for all datasets. |
| Hardware Specification | Yes | We ran the experiments using GPU: NVIDIA A100 80GB PCIe; CPU: Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz; |
| Software Dependencies | No | To compute the ground truth SE on the train set and its corresponding eigenvalues, we constructed an affinity matrix W from the train set (as detailed in Appendix C.2), with a number of neighbors detailed in Table 12. After constructing W, we computed the leading k eigenvectors of its corresponding Unnormalized Laplacian L = D W via Python s Numpy SVD or Sci Py LOBPCG SVD (depending on the size). To run UMAP, we used Python s umap-learn implementation (UMAP s formal implementation). For Parametric UMAP we used the Pytorch implementaion (Liu, 2024). |
| Experiment Setup | Yes | The architectures of Sep-Spectral Net s and Spectral Net s networks in all of the experiments were as follows: size = 256; Re LU, size = 256; Re LU, size = 512; Re LU, size = k + 1; orthonorm. NUMAP s second NN and PUMAP s NN architectures for all datasets was: size = 200; Re LU, size = 200; Re LU, size = 200; Re LU, size = 2; The SE dimensions for NUMAP were: Cifar10 10; Appliances 5; Wine 10; Banknote 3; Mnist 10, Fashion Mnist 10. For the datasets in Fig. 1, from top to bottom: Circles 5, Cylinders 11, Line 2. The learning rate policy for Sep-Spectral Net and Spectral Net is determined by monitoring the loss on a validation set (a random subset of the training set); once the validation loss did not improve for a specified number of epochs, we divided the learning rate by 10. Training stopped once the learning rate reached 10 7. In particular, we used the following approximation to determine the patience epochs, where n is the number of samples and m is the batch size: if n m 25, we chose the patience to be 10; otherwise, the patience decreases as max (1, 250m / n ) (i.e., the number of iterations is the deciding feature). Table 12: Technical details in the Sep-Spectral Net experiments for all datasets. Moon Cifar10 Appliances KMNIST Parkinsons Batch size 2048 2048 2048 2048 512 n_neighbors 20 20 20 20 5 Initial LR 10 2 10 2 10 3 10 3 10 2 Optimizer ADAM ADAM ADAM ADAM ADAM |