reproducibilityindex.ai

Modern Neural Networks Generalize on Small Data Sets

Authors: Matthew Olson, Abraham Wyner, Richard Berk

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we use a linear program to empirically decompose ﬁtted neural networks into ensembles of low-bias sub-networks. We show that these sub-networks are relatively uncorrelated which leads to an internal regularization process, very much like a random forest, which can explain why a neural network is surprisingly resistant to overﬁtting. We then demonstrate this in practice by applying large neural networks, with hundreds of parameters per training observation, to a collection of 116 real-world data sets from the UCI Machine Learning Repository.
Researcher Affiliation	Academia	Matthew Olson Department of Statistics Wharton School University of Pennsylvania Philadelphia, PA 19104 maolson@wharton.upenn.edu Abraham J. Wyner Department of Statistics Wharton School University of Pennsylvania Philadelphia, PA 19104 ajw@wharton.upenn.edu Richard Berk Department of Statistics Wharton School University of Pennsylvania Philadelphia, PA 19104 berkr@wharton.upenn.edu
Pseudocode	No	The paper describes procedures using mathematical equations and textual explanations, but it does not contain a structured pseudocode block or an explicitly labeled algorithm.
Open Source Code	No	The paper does not provide any explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets	Yes	In this work, we consider a much richer class of small data sets from the UCI Machine Learning Repository in order to study the generalization paradox. and The collection of data sets we consider were ﬁrst analyzed in a large-scale study comparing the accuracy of 147 different classiﬁers [10].
Dataset Splits	No	The paper mentions "All results are reported over 25 randomly chosen 80-20 training-testing splits" and "cross-validated accuracy," but does not explicitly provide details for a separate validation split (e.g., percentage or sample count for validation data) beyond what might be implied by cross-validation.
Hardware Specification	No	The paper does not provide specific details regarding the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper mentions components like the Adam optimizer and ELU activation function, but it does not list specific software dependencies (e.g., programming languages, libraries, frameworks) along with their version numbers required for reproducibility.
Experiment Setup	Yes	Both networks shared the following architecture and training speciﬁcations: 10 hidden layers, 100 nodes per layer, 200 epochs of gradient descent using Adam optimizer with a learning rate of 0.001 [15]. He-initialization for each hidden layer [12], Elu activation function [8]. and More speciﬁcally, one network was ﬁt using dropout with a keep-rate of 0.85, while the other network was ﬁt without explicit regularization.