Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Johnson-Lindenstrauss Lemma Beyond Euclidean Geometry

Authors: Chengyuan Deng, Jie Gao, Kevin Lu, Feng Luo, Cheng Xin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate our approaches on both synthetic and real-world datasets, demonstrating the effectiveness of extending the JL lemma to non-Euclidean settings. In this paper, we study how to apply the Johnson-Lindenstrauss transform (a.k.a. random linear projection) in non-Euclidean, non-metric settings. We provide theoretical analysis on the performance of the JL transform in these settings and show experiments with both synthetic and real-world data. Experiments. We implement both methods of JL transform with respect to the above results and evaluate their performances on 10 datasets. We observe that the experiment results corroborate with our theoretical results, and outperform classical JL transform consistently on non-Euclidean datasts.
Researcher Affiliation Academia Rutgers University. EMAIL
Pseudocode Yes 4 Algorithms for Non-Euclidean Johnson-Lindenstrauss Transforms The algorithms used for the experiments are presented as follows. In both cases we assume an input dissimilarity matrix D which is symmetric (Dij = Dji) and hollow (Dii = 0). Pseudo-Euclidean JL Transform 1. Find the orthogonal decomposition of the dissimilarity matrix D = OΛOT where Λ is a diagonal matrix of the eigenvalues. 2. Let A be the (p, q) signature, i.e. A = sign(Λ) 3. Compute the the embedding into Rp,q by finding V AV T . 4. For each point x in the embedding, split it into x(p) and x(q). Use standard JL transform to project into Rp and Rq respectively where p and q are the specified target dimension. 5. Return the projected points (x(p ), x(q )) along with their (p , q ) signature. Power distance JL Transform 1. Find the orthogonal decomposition of the dissimilarity matrix OΛOT 2. Compute en, the smallest eigenvalue of Gram(D). Set r = p 3. Add 4r2(J I) to D to obtain the new Euclidean distance matrix, E. 4. Recover the Euclidean coordinates X such that Eij = x i x j 2. Perform standard JL transform on X to target dimension m. 5. Return the projected points along with their radii r.
Open Source Code Yes Our codes are on the Anonymous Github2. 2https://anonymous.4open.science/r/Non-Euclidean-Johnson-Lindenstrauss-1673
Open Datasets Yes We consider three categories: genomics, image and graph data. The genomics data includes three cancer-related datasets from the Curated Microarray Database (Cu Mi Da) [54]. Following the practice in prior work [18], we obtain dissimilarities with entropic affinity. We also test two celebrated image datasets: MNIST and CIFAR-10, each with 1000 images randomly sampled. We use the measures mentioned in [55] to calculate the dissimilarities. The graph datasets are selected from the SNAP project [56].
Dataset Splits No No explicit dataset split information (e.g., percentages, counts, or methodology for train/test/validation sets) is provided in the paper. The paper mentions randomly sampling 1000 images for MNIST and CIFAR-10, but no further split details for experiments are given.
Hardware Specification Yes All experiments are done with a Macbook Pro with Apple M2 Chip of 16GB memory. The algorithms are fairly efficient and we did not encounter issues on computational resources.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes The JL Transforms have only one parameter ε, and we set that to be 0.5 throughout the paper. The constant used in the big O notation of O(log n/ε2) is 2.