Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Significance Tests for Neural Networks
Authors: Enguerrand Horel, Kay Giesecke
JMLR 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Simulation results illustrate the computational efficiency and the performance of the test. An empirical application to house price valuation highlights the behavior of the test using actual data. We estimate the power and size of the test by performing it on 250 alternative data sets (Yi, Xi)n i=1 generated from the model (26). We use the significance test to study the variables influencing house prices in the United States. We analyze a data set of 76,247 housing transactions in California s Merced County between 1970 and 2017. |
| Researcher Affiliation | Academia | Enguerrand Horel EMAIL Institute for Computational and Mathematical Engineering Stanford University Stanford, CA 94305, USA; Kay Giesecke EMAIL Department of Management Science and Engineering Stanford University Stanford, CA 94305, USA |
| Pseudocode | No | The paper describes the methods in prose and mathematical formulations but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper states: "We fit a fully-connected feed-forward neural network with one hidden layer and sigmoid activation function to the training set using the Tensor Flow package." It references a third-party software (TensorFlow) but does not provide any statement or link for the authors' own implementation code. |
| Open Datasets | No | Section 5.1 describes a synthetic "Data-Generating Process" for simulation experiments. Section 6 states: "We analyze a data set of 76,247 housing transactions in California s Merced County between 1970 and 2017. The data are obtained from the county s registrar of deed office through the data vendor Core Logic." While the original source (county registrar) might be public, the data is explicitly |
| Dataset Splits | Yes | We generate a training set of n = 100, 000 independent samples and validation and testing sets of 10, 000 independent samples each. 70% of the data is used for training, 20% for validation, and the remainder is used for testing. |
| Hardware Specification | No | The paper mentions using "Tensor Flow package" for fitting neural networks but does not provide any specific details about the hardware (e.g., GPU, CPU models) used for running experiments. |
| Software Dependencies | No | The paper mentions using "Tensor Flow package" but does not specify a version number for it or any other software. |
| Experiment Setup | Yes | We employ the Adam stochastic optimization method with step size 0.001 and exponential decay rates for the moment estimates of 0.9 and 0.999. We use a batch size of 32, a maximum number of 150 epochs and an early stopping criterion that stops the training when the validation error has not decreased by more than 10 5 for at least 5 epochs. The number of hidden nodes is chosen so as to minimize the validation loss. A network with 25 hidden units performs best. We employ the Adam stochastic optimization method with step size of 0.001 and exponential decay rates for the moment estimates of 0.9 and 0.999. We use a batch size of 32, a maximum number of 100 epochs and an early stopping criterion that stops the training when the validation error has not decreased by more than 10 3 for at least 10 epochs. The number of hidden nodes and the regularization weight are chosen so as to minimize the validation loss. The optimal network architecture has 150 hidden units and the optimal regularization weight is 10 5. |