Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
The Noisy Laplacian: a Threshold Phenomenon for Non-Linear Dimension Reduction
Authors: Alex Kokot, Octavian-Vlad Murad, Marina Meila
ICML 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experimentally validate our theoretical predictions. Additionally, we observe similar robust behavior for other manifold learning algorithms which are not based on computing the Laplacian, namely LTSA and VAE. 4. Experiments: We experimentally verify the extent to which the approximation in Theorem 3.1 holds for both synthetic and real datasets. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Washington, Seattle, USA 2Computer Science and Engineering, University of Washington, Seattle, USA. Correspondence to: Alex Kokot <EMAIL>. |
| Pseudocode | No | The paper describes methods like Diffusion Maps, VAE, and LTSA in textual and mathematical form, but it does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the methodology described, nor does it include any links to a code repository. |
| Open Datasets | Yes | For our real data experiments we use Molecular Dynamic Simulations (MDS) datasets consisting of atomic configurations from three molecules: Toluene(Tol), Malonaldehyde(Mal), and Ethanol(Eth) from (28). (28) S. Chmiela, A. Tkatchenko, H. Sauceda, I. Poltavsky, K. T. Schütt, and K.-R. Müller, Machine learning of accurate energy-conserving molecular force fields, Science Advances, March 2017. |
| Dataset Splits | No | The paper states, 'We sample n = 5000 points' for synthetic data and 'We sample n = 7500 points for all our experiments' for real data, but it does not provide details on how these samples are split into training, validation, or test sets. |
| Hardware Specification | No | The paper mentions 'These simulations require massive compute power' in the context of Molecular Dynamic Simulations (MDS) datasets, but it does not provide any specific hardware details (like GPU/CPU models or types) used for running its own experiments. |
| Software Dependencies | No | The paper mentions using techniques like Layer Normalization, GELU activation, and Adam optimizer, and algorithms such as Diffusion Maps, VAE, and LTSA. However, it does not provide specific version numbers for any software libraries or frameworks used in the implementation. |
| Experiment Setup | Yes | For all our VAE embeddings we use the same network which has an encoder with FC layers of sizes (64, 128), an embedding size of m = 4, and a decoder with FC layers of sizes (128, 64). We use Layer Normalization (42) and GELU activation (43) between the hidden layers, a batch size of 256, Adam optimizer (44), and a weight of 0.1 of the KL-Divergence loss relative to the reconstruction loss. For all our LTSA embeddings we use an embedding size of m = 12 and k = 32 nearest neighbors without a cutoff radius. For most of our experiments we use an embedding size of m = 24, but in some case we use m > 24. |