Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds
Authors: Kry Lui, Gavin Weiguang Ding, Ruitong Huang, Robert McCann
NeurIPS 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we demonstrate on a synthetic dataset that our lower bound in Theorem 4 can be a reasonable guidance for selecting the retrieval neighborhood radius r V , which emphasizes on high precision. The simulation environment is to compute the optimal r V by minimizing the lower bound in Theorem 4, with a given relevant neighborhood radius r U and embedding dimension m. |
| Researcher Affiliation | Collaboration | Kry Yik Chau Lui Borealis AI Canada EMAIL Gavin Weiguang Ding Borealis AI Canada EMAIL Ruitong Huang Borealis AI Canada EMAIL Robert J. Mc Cann Department of Mathematics University of Toronto Canada EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide concrete access to its own source code. It only mentions using the 'python optimal transport library (POT) [12]' which is a third-party tool. |
| Open Datasets | No | The paper states that they 'generate 10000 uniformly distributed samples in a 10-dimensional unit โ2-ball' which is a synthetic dataset generated by the authors, but no concrete access information (link, DOI, citation) is provided for this specific dataset to be publicly available. |
| Dataset Splits | No | The paper describes generating a synthetic dataset of '10000 uniformly distributed samples' but does not specify any training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not provide any specific hardware details used for running its simulations or experiments. |
| Software Dependencies | No | The paper mentions that 'Our code is based on python optimal transport library (POT) [12]' but does not provide a specific version number for POT or any other software dependencies. |
| Experiment Setup | Yes | Speci๏ฌcally, we generate 10000 uniformly distributed samples in a 10-dimensional unit โ2-ball. We choose r U such that on average each data point has 500 neighbors inside Br U . We then linearly project these 10 dimensional points into lower dimensional spaces with embedding dimension m from 1 to 9. For each m, a different r V is used to calculate discrete precision and recall. |