Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Noisy Laplacian: a Threshold Phenomenon for Non-Linear Dimension Reduction

Authors: Alex Kokot, Octavian-Vlad Murad, Marina Meila

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We experimentally validate our theoretical predictions. Additionally, we observe similar robust behavior for other manifold learning algorithms which are not based on computing the Laplacian, namely LTSA and VAE. 4. Experiments: We experimentally verify the extent to which the approximation in Theorem 3.1 holds for both synthetic and real datasets.
Researcher Affiliation	Academia	1Department of Statistics, University of Washington, Seattle, USA 2Computer Science and Engineering, University of Washington, Seattle, USA. Correspondence to: Alex Kokot <EMAIL>.
Pseudocode	No	The paper describes methods like Diffusion Maps, VAE, and LTSA in textual and mathematical form, but it does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include any links to a code repository.
Open Datasets	Yes	For our real data experiments we use Molecular Dynamic Simulations (MDS) datasets consisting of atomic configurations from three molecules: Toluene(Tol), Malonaldehyde(Mal), and Ethanol(Eth) from (28). (28) S. Chmiela, A. Tkatchenko, H. Sauceda, I. Poltavsky, K. T. Schütt, and K.-R. Müller, Machine learning of accurate energy-conserving molecular force fields, Science Advances, March 2017.
Dataset Splits	No	The paper states, 'We sample n = 5000 points' for synthetic data and 'We sample n = 7500 points for all our experiments' for real data, but it does not provide details on how these samples are split into training, validation, or test sets.
Hardware Specification	No	The paper mentions 'These simulations require massive compute power' in the context of Molecular Dynamic Simulations (MDS) datasets, but it does not provide any specific hardware details (like GPU/CPU models or types) used for running its own experiments.
Software Dependencies	No	The paper mentions using techniques like Layer Normalization, GELU activation, and Adam optimizer, and algorithms such as Diffusion Maps, VAE, and LTSA. However, it does not provide specific version numbers for any software libraries or frameworks used in the implementation.
Experiment Setup	Yes	For all our VAE embeddings we use the same network which has an encoder with FC layers of sizes (64, 128), an embedding size of m = 4, and a decoder with FC layers of sizes (128, 64). We use Layer Normalization (42) and GELU activation (43) between the hidden layers, a batch size of 256, Adam optimizer (44), and a weight of 0.1 of the KL-Divergence loss relative to the reconstruction loss. For all our LTSA embeddings we use an embedding size of m = 12 and k = 32 nearest neighbors without a cutoff radius. For most of our experiments we use an embedding size of m = 24, but in some case we use m > 24.