SNEkhorn: Dimension Reduction with Symmetric Entropic Affinities
Authors: Hugues Van Assel, Titouan Vayer, Rémi Flamary, Nicolas Courty
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show its clear superiority to existing approaches with several indicators on both synthetic and real-world datasets. 5 Numerical experiments This section aims to illustrate the performances of the proposed affinity matrix Pse (SEA) and DR method SNEkhorn at faithfully representing dependencies and clusters in low dimensions. |
| Researcher Affiliation | Academia | Hugues Van Assel ENS de Lyon, CNRS UMPA UMR 5669 hugues.van_assel@ens-lyon.fr Titouan Vayer Univ. Lyon, ENS de Lyon, UCBL, CNRS, Inria LIP UMR 5668 titouan.vayer@inria.fr Rémi Flamary École polytechnique, IP Paris, CNRS CMAP UMR 7641 remi.flamary@polytechnique.edu Nicolas Courty Université Bretagne Sud, CNRS IRISA UMR 6074 nicolas.courty@irisa.fr |
| Pseudocode | Yes | i, [f Z]i 1 / [f Z]i log X k exp [f Z]k [CZ]ki ! . (Sinkhorn) |
| Open Source Code | Yes | Our code is available at https://github.com/Python OT/SNEkhorn. |
| Open Datasets | Yes | For images, we use COIL 20 [34], OLIVETTI faces [12], UMNIST [15] and CIFAR 10 [22]. For CIFAR, we experiment with features obtained from the last hidden layer of a pre-trained Res Net [38] while for the other three datasets, we take as input the raw pixel data. Regarding genomics data, we consider the Curated Microarray Database (Cu Mi Da) [11] made of microarray datasets for various types of cancer, as well as the preprocessed SNAREseq (chromatin accessibility) and sc GEM (gene expression) datasets used in [9]. |
| Dataset Splits | No | The paper mentions using grid-search for hyperparameter tuning and initializing runs with independent N(0,1) coordinates, but does not explicitly provide details about training, validation, and test dataset splits (e.g., percentages, counts, or explicit references to standard splits). |
| Hardware Specification | No | The paper does not specify any particular hardware (e.g., GPU, CPU models, or cloud instances) used for running the experiments. |
| Software Dependencies | No | The paper mentions implementing methods in 'PyTorch [35]' and using 'scikit-learn [36]' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For perplexity parameters, we test all multiples of 10 in the interval [10, min(n, 300)] where n is the number of samples in the dataset. We use the same grid for the k of the self-tuning affinity Pst [53] and for the n_neighbors parameter of UMAP. For scalar bandwidths, we consider powers of 10 such that the corresponding affinities average perplexity belongs to the perplexity range. All models were optimized using ADAM [18] with default parameters and the same stopping criterion: the algorithm stops whenever the relative variation of the loss becomes smaller than 10 5. |