Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
High-dimensional Asymptotics of Denoising Autoencoders
Authors: Hugo Cui, Lenka Zdeborová
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We further show that our results accurately capture the learning curves on a range of real data sets. (...) We show that these formulae also describe quantitatively rather well the denoising MSE for real data sets, including MNIST [15] and Fashion MNIST [16]. |
| Researcher Affiliation | Academia | Hugo Cui Statistical Physics of Computation Lab Department of Physics EPFL, Lausanne, Switzerland EMAIL Lenka Zdeborová Statistical Physics of Computation Lab Department of Physics EPFL, Lausanne, Switzerland |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code used in the present manuscript can be found in the following repository. |
| Open Datasets | Yes | We show that these formulae also describe quantitatively rather well the denoising MSE for real data sets, including MNIST [15] and Fashion MNIST [16]. (...) For each data set, samples sharing the same label were considered to belong to the same cluster. (...) For each cluster, the corresponding mean µ and covariance Σ were numerically evaluated from the empirical mean and covariance over the 6000 boots (shoes) in the Fashion MNIST training set, and the 6265 1s (7s) in the MNIST training set. |
| Dataset Splits | No | The paper mentions training and test sets but does not explicitly specify a separate validation set or its split. |
| Hardware Specification | No | The paper mentions using "Pytorch implementation" for numerical simulations but does not specify any particular hardware details such as GPU/CPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using "Pytorch implementation of full-batch Adam [43]" but does not specify version numbers for Pytorch or Python, which are necessary for reproducible software dependencies. |
| Experiment Setup | Yes | Dots represent numerical simulations for d = 700, training the DAE using the Pytorch implementation of full-batch Adam, with learning rate η = 0.05 over 2000 epochs, averaged over N = 10 instances. Error bars represent one standard deviation. (...) training a DAE (p = 1, σ = tanh) trained with n = 784 training points, using the Pytorch implementation of full-batch Adam, with learning rate η = 0.05 and weight decay λ = 0.1 over 2000 epochs, averaged over N = 10 instances. |