Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds
Authors: Kry Lui, Gavin Weiguang Ding, Ruitong Huang, Robert McCann
NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate on a synthetic dataset that our lower bound in Theorem 4 can be a reasonable guidance for selecting the retrieval neighborhood radius r V , which emphasizes on high precision. The simulation environment is to compute the optimal r V by minimizing the lower bound in Theorem 4, with a given relevant neighborhood radius r U and embedding dimension m. |
| Researcher Affiliation | Collaboration | Kry Yik Chau Lui Borealis AI Canada yikchau.y.lui@borealisai.com Gavin Weiguang Ding Borealis AI Canada gavin.ding@borealisai.com Ruitong Huang Borealis AI Canada ruitong.huang@borealisai.com Robert J. Mc Cann Department of Mathematics University of Toronto Canada mccann@math.toronto.edu |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to its own source code. It only mentions using the 'python optimal transport library (POT) [12]' which is a third-party tool. |
| Open Datasets | No | The paper states that they 'generate 10000 uniformly distributed samples in a 10-dimensional unit ℓ2-ball' which is a synthetic dataset generated by the authors, but no concrete access information (link, DOI, citation) is provided for this specific dataset to be publicly available. |
| Dataset Splits | No | The paper describes generating a synthetic dataset of '10000 uniformly distributed samples' but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its simulations or experiments. |
| Software Dependencies | No | The paper mentions that 'Our code is based on python optimal transport library (POT) [12]' but does not provide a specific version number for POT or any other software dependencies. |
| Experiment Setup | Yes | Specifically, we generate 10000 uniformly distributed samples in a 10-dimensional unit ℓ2-ball. We choose r U such that on average each data point has 500 neighbors inside Br U . We then linearly project these 10 dimensional points into lower dimensional spaces with embedding dimension m from 1 to 9. For each m, a different r V is used to calculate discrete precision and recall. |