Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Data-Dependent Generalization Bounds for Neural Networks with ReLU

Authors: Harsh Pandey, Amitabha Bagchi, Srikanta J. Bedathur, Arindam Bhattacharya

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we present experimental evidence to validate our theoretical results. (Abstract) Then, in Section 6, we experimentally verify the results showing that the bounded condition holds and plot the generalization error. (Section 5) We perform experiments to validate the results and also empirically show that for random label case, WCTr G grows unboundedly, and so we can t guarantee generalization in this case, which is as expected.
Researcher Affiliation Academia Harsh Pandey EMAIL Department of Computer Science IIT Delhi New Delhi, India Amitabha Bagchi EMAIL Department of Computer Science IIT Delhi New Delhi, India Srikanta Bedathur EMAIL Department of Computer Science IIT Delhi New Delhi, India Arindam Bhattacharya EMAIL Department of Computer Science IIT Delhi New Delhi, India
Pseudocode No The paper describes methodologies in prose and mathematical formulations but does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not contain any explicit statements about code release, nor does it provide links to source code repositories. Mentions of other works providing information-theoretic generalization bounds for SGLD refer to third-party tools, not the authors' own implementation.
Open Datasets Yes For our experiments we use MNIST and Fashion MNIST datasets. (Section 6.1) We pick images from the 0 and 1 label class of MNIST dataset. (Section 6.2)
Dataset Splits Yes In both datasets, we randomly selected 20, 000 training and 1, 000 test points. (Section 6.1) We first split each dataset in a 20:1 ratio into training and validation sets and train the model at varying sizes of training sets. (Section 6.1, Experiment 2) We then randomly sample a test set T (|T | = 50). (Section 6.2)
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or cloud computing resources.
Software Dependencies No The paper mentions software components like SGD, cross-entropy loss, and ReLU activation, but does not specify any version numbers for programming languages, libraries, or frameworks used in the experiments.
Experiment Setup Yes All experiments were conducted using a fully connected feed forward neural network with a single hidden layer and Re LU activation. We train the model using SGD (batch size = 1), with cross-entropy loss, starting with randomly initialized weights. As suggested in our analysis we use a decreasing learning rate αt = α0 / t . (Section 6.1, Setup) α0 = 0.001 (Section 6.1, Experiment 1) We take a single hidden layer (128 neurons) fully connected neural network having Re LU activation in the hidden layer. We take the loss function as l(ˆy, y) = 1 Softmax(c ˆy, y) where c = 6. We use a constant learning rate of 0.003, batch size of size 8. (Section 6.2, Setup)