Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Distributional Autoencoders Know the Score

Authors: Andrej Leban

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments So far, the results presented are at the population level. In all the experiments below, the decoder is an Engression network, for which non-asymptotic finite-sample error bounds are available see Thm. 3 in [19], so deviations from the population predictions shrink with the sample size. The encoder is a standard MLP throughout, so usual generalization arguments apply. 4.1 Level Set Score Alignment We present score alignment results for examples where the data density Pdata is known. ... Figures 1, 2, 6, 7 and Tables 1, 2, 3, 5, 6.
Researcher Affiliation	Academia	Andrej Leban Department of Statistics, University of Michigan, Ann Arbor, MI, United States EMAIL
Pseudocode	No	The paper describes methods using mathematical equations (e.g., Eq. 2, Eq. 7, Eq. 10) and detailed textual explanations within sections like '2 Optimal encoder level sets align with the data score' and '3 On (approximately) parameterizable manifolds, extraneous latents are uninformative', but does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available at: github.com/andleb/Distributional Autoencoders Score. The code required to reproduce the results is provided at github.com/andleb/Distributional Autoencoders Score. Please refer to README.md document therein for full details on the code structure and details.
Open Datasets	Yes	The data for the Müller Brown potential examples is bundled with the examples provided in the mlcolvar project [4]: https://github.com/luigibonati/mlcolvar, available under the MIT license. Additionally, the data for the Müller Brown potential examples is bundled with the examples provided in the mlcolvar repository, and reproduced in this supplement for convenience.
Dataset Splits	No	The paper mentions 'Training data: 10,000 Müller-Brown MD samples' and 'All metrics are computed on a held-out test set' for the S-curve dataset, but it does not specify explicit percentages or counts for training, validation, and test splits for any of the datasets used.
Hardware Specification	Yes	The MFEP parameterization experiment was run on a single Nvidia V100 GPU, taking 1 hour and 38 seconds of wall time. The Independence experiments took roughly 5 hours on a single Nvidia V100 GPU.
Software Dependencies	Yes	All other dependencies are installed via pip (cf. the requirements.txt file provided); their names,versions and licences appear in third_party_licenses/THIRD_PARTY_LICENSES.md.
Experiment Setup	Yes	DPA models trained with β = 2, deterministic encoder, stochastic decoder, 3 latent dimensions (k = 3), 4-layer networks with 256 hidden units per layer, residual blocks enabled. Training used standardized inputs. ... All models use 2D latent space, encoder/decoder with two hidden layers of 100 units, ELU activations. Training: 1200 epochs, Adam optimizer with lr = 10^-3 (VAE variants) or 5 * 10^-4 (DPA), batch size 5000 (DPA/AE/VAE) or 256 (TC-VAE to ensure batch 2 for log-density-ratio estimation).