SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities

Authors: Hugues Van Assel, Titouan Vayer, Rémi Flamary, Nicolas Courty

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show its clear superiority to existing approaches with several indicators on both synthetic and real-world datasets. 5 Numerical experiments This section aims to illustrate the performances of the proposed affinity matrix Pse (SEA) and DR method SNEkhorn at faithfully representing dependencies and clusters in low dimensions.
Researcher Affiliation Academia Hugues Van Assel ENS de Lyon, CNRS UMPA UMR 5669 hugues.van_assel@ens-lyon.fr Titouan Vayer Univ. Lyon, ENS de Lyon, UCBL, CNRS, Inria LIP UMR 5668 titouan.vayer@inria.fr Rémi Flamary École polytechnique, IP Paris, CNRS CMAP UMR 7641 remi.flamary@polytechnique.edu Nicolas Courty Université Bretagne Sud, CNRS IRISA UMR 6074 nicolas.courty@irisa.fr
Pseudocode Yes i, [f Z]i 1 / [f Z]i log X k exp [f Z]k [CZ]ki ! . (Sinkhorn)
Open Source Code Yes Our code is available at https://github.com/Python OT/SNEkhorn.
Open Datasets Yes For images, we use COIL 20 [34], OLIVETTI faces [12], UMNIST [15] and CIFAR 10 [22]. For CIFAR, we experiment with features obtained from the last hidden layer of a pre-trained Res Net [38] while for the other three datasets, we take as input the raw pixel data. Regarding genomics data, we consider the Curated Microarray Database (Cu Mi Da) [11] made of microarray datasets for various types of cancer, as well as the preprocessed SNAREseq (chromatin accessibility) and sc GEM (gene expression) datasets used in [9].
Dataset Splits No The paper mentions using grid-search for hyperparameter tuning and initializing runs with independent N(0,1) coordinates, but does not explicitly provide details about training, validation, and test dataset splits (e.g., percentages, counts, or explicit references to standard splits).
Hardware Specification No The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments.
Software Dependencies No The paper mentions implementing methods in 'PyTorch [35]' and using 'scikit-learn [36]' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes For perplexity parameters, we test all multiples of 10 in the interval [10, min(n, 300)] where n is the number of samples in the dataset. We use the same grid for the k of the self-tuning affinity Pst [53] and for the n_neighbors parameter of UMAP. For scalar bandwidths, we consider powers of 10 such that the corresponding affinities average perplexity belongs to the perplexity range. All models were optimized using ADAM [18] with default parameters and the same stopping criterion: the algorithm stops whenever the relative variation of the loss becomes smaller than 10 5.