Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Significance Tests for Neural Networks

Authors: Enguerrand Horel, Kay Giesecke

JMLR 2020 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Simulation results illustrate the computational efficiency and the performance of the test. An empirical application to house price valuation highlights the behavior of the test using actual data. We estimate the power and size of the test by performing it on 250 alternative data sets (Yi, Xi)n i=1 generated from the model (26). We use the signiﬁcance test to study the variables inﬂuencing house prices in the United States. We analyze a data set of 76,247 housing transactions in California s Merced County between 1970 and 2017.
Researcher Affiliation	Academia	Enguerrand Horel EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Kay Giesecke EMAIL Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA
Pseudocode	No	The paper describes the methods in prose and mathematical formulations but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "We ﬁt a fully-connected feed-forward neural network with one hidden layer and sigmoid activation function to the training set using the Tensor Flow package." It references a third-party software (TensorFlow) but does not provide any statement or link for the authors' own implementation code.
Open Datasets	No	Section 5.1 describes a synthetic "Data-Generating Process" for simulation experiments. Section 6 states: "We analyze a data set of 76,247 housing transactions in California s Merced County between 1970 and 2017. The data are obtained from the county s registrar of deed oﬃce through the data vendor Core Logic." While the original source (county registrar) might be public, the data is explicitly
Dataset Splits	Yes	We generate a training set of n = 100, 000 independent samples and validation and testing sets of 10, 000 independent samples each. 70% of the data is used for training, 20% for validation, and the remainder is used for testing.
Hardware Specification	No	The paper mentions using "Tensor Flow package" for fitting neural networks but does not provide any specific details about the hardware (e.g., GPU, CPU models) used for running experiments.
Software Dependencies	No	The paper mentions using "Tensor Flow package" but does not specify a version number for it or any other software.
Experiment Setup	Yes	We employ the Adam stochastic optimization method with step size 0.001 and exponential decay rates for the moment estimates of 0.9 and 0.999. We use a batch size of 32, a maximum number of 150 epochs and an early stopping criterion that stops the training when the validation error has not decreased by more than 10 5 for at least 5 epochs. The number of hidden nodes is chosen so as to minimize the validation loss. A network with 25 hidden units performs best. We employ the Adam stochastic optimization method with step size of 0.001 and exponential decay rates for the moment estimates of 0.9 and 0.999. We use a batch size of 32, a maximum number of 100 epochs and an early stopping criterion that stops the training when the validation error has not decreased by more than 10 3 for at least 10 epochs. The number of hidden nodes and the regularization weight are chosen so as to minimize the validation loss. The optimal network architecture has 150 hidden units and the optimal regularization weight is 10 5.