Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds

Authors: Kry Lui, Gavin Weiguang Ding, Ruitong Huang, Robert McCann

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we demonstrate on a synthetic dataset that our lower bound in Theorem 4 can be a reasonable guidance for selecting the retrieval neighborhood radius r V , which emphasizes on high precision. The simulation environment is to compute the optimal r V by minimizing the lower bound in Theorem 4, with a given relevant neighborhood radius r U and embedding dimension m.
Researcher Affiliation Collaboration Kry Yik Chau Lui Borealis AI Canada yikchau.y.lui@borealisai.com Gavin Weiguang Ding Borealis AI Canada gavin.ding@borealisai.com Ruitong Huang Borealis AI Canada ruitong.huang@borealisai.com Robert J. Mc Cann Department of Mathematics University of Toronto Canada mccann@math.toronto.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide concrete access to its own source code. It only mentions using the 'python optimal transport library (POT) [12]' which is a third-party tool.
Open Datasets No The paper states that they 'generate 10000 uniformly distributed samples in a 10-dimensional unit ℓ2-ball' which is a synthetic dataset generated by the authors, but no concrete access information (link, DOI, citation) is provided for this specific dataset to be publicly available.
Dataset Splits No The paper describes generating a synthetic dataset of '10000 uniformly distributed samples' but does not specify any training, validation, or test dataset splits.
Hardware Specification No The paper does not provide any specific hardware details used for running its simulations or experiments.
Software Dependencies No The paper mentions that 'Our code is based on python optimal transport library (POT) [12]' but does not provide a specific version number for POT or any other software dependencies.
Experiment Setup Yes Specifically, we generate 10000 uniformly distributed samples in a 10-dimensional unit ℓ2-ball. We choose r U such that on average each data point has 500 neighbors inside Br U . We then linearly project these 10 dimensional points into lower dimensional spaces with embedding dimension m from 1 to 9. For each m, a different r V is used to calculate discrete precision and recall.