Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Data-Dependent Generalization Bounds for Neural Networks with ReLU
Authors: Harsh Pandey, Amitabha Bagchi, Srikanta J. Bedathur, Arindam Bhattacharya
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we present experimental evidence to validate our theoretical results. (Abstract) Then, in Section 6, we experimentally verify the results showing that the bounded condition holds and plot the generalization error. (Section 5) We perform experiments to validate the results and also empirically show that for random label case, WCTr G grows unboundedly, and so we can t guarantee generalization in this case, which is as expected. |
| Researcher Affiliation | Academia | Harsh Pandey EMAIL Department of Computer Science IIT Delhi New Delhi, India Amitabha Bagchi EMAIL Department of Computer Science IIT Delhi New Delhi, India Srikanta Bedathur EMAIL Department of Computer Science IIT Delhi New Delhi, India Arindam Bhattacharya EMAIL Department of Computer Science IIT Delhi New Delhi, India |
| Pseudocode | No | The paper describes methodologies in prose and mathematical formulations but does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about code release, nor does it provide links to source code repositories. Mentions of other works providing information-theoretic generalization bounds for SGLD refer to third-party tools, not the authors' own implementation. |
| Open Datasets | Yes | For our experiments we use MNIST and Fashion MNIST datasets. (Section 6.1) We pick images from the 0 and 1 label class of MNIST dataset. (Section 6.2) |
| Dataset Splits | Yes | In both datasets, we randomly selected 20, 000 training and 1, 000 test points. (Section 6.1) We first split each dataset in a 20:1 ratio into training and validation sets and train the model at varying sizes of training sets. (Section 6.1, Experiment 2) We then randomly sample a test set T (|T | = 50). (Section 6.2) |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models, or cloud computing resources. |
| Software Dependencies | No | The paper mentions software components like SGD, cross-entropy loss, and ReLU activation, but does not specify any version numbers for programming languages, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | All experiments were conducted using a fully connected feed forward neural network with a single hidden layer and Re LU activation. We train the model using SGD (batch size = 1), with cross-entropy loss, starting with randomly initialized weights. As suggested in our analysis we use a decreasing learning rate αt = α0 / t . (Section 6.1, Setup) α0 = 0.001 (Section 6.1, Experiment 1) We take a single hidden layer (128 neurons) fully connected neural network having Re LU activation in the hidden layer. We take the loss function as l(ˆy, y) = 1 Softmax(c ˆy, y) where c = 6. We use a constant learning rate of 0.003, batch size of size 8. (Section 6.2, Setup) |