Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Comparative Generalization Bounds for Deep Neural Networks

Authors: Tomer Galanti, Liane Galanti, Ido Ben-Shaul

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical results demonstrate that, in standard classification settings, neural networks trained using Stochastic Gradient Descent (SGD) tend to have small effective depths. We also explore the relationship between effective depth, the complexity of the training dataset, and generalization. For instance, we find that the effective depth of a trained neural network increases as the proportion of random labels in the data rises. Finally, we derive a generalization bound by comparing the effective depth of a network with the minimal depth required to fit the same dataset with partially corrupted labels. This bound provides nonvacuous predictions of test performance and is found to be empirically independent of the actual depth of the network.
Researcher Affiliation	Academia	Tomer Galanti EMAIL Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Liane Galanti EMAIL School of Computer Science Tel Aviv University Ido Ben-Shaul EMAIL Department of Applied Mathematics Tel-Aviv University e Bay Research
Pseudocode	No	The paper describes methodologies and proofs but does not include any explicitly labeled pseudocode or algorithm blocks. The description of optimization in Section 2 or experimental estimation in Appendix A.2 are prose.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets	Yes	Dataset MNIST Fashion MNIST CIFAR10 CIFAR10 Architecture CONV-L-50 CONV-L-100 CONV-L-100 CONVRES-L-50 Depth (L) 10 12 15 10 12 15 16 18 20 10 12 15 Test error 0.0075 0.0074 0.0074 0.0996 0.0996 0.0996 0.2659 0.2653 0.2648 0.2903 0.2862 0.2804 ... Datasets. We consider various datasets: MNIST, Fashion MNIST, and CIFAR10. For CIFAR10 we used random cropping, random horizontal flips, and random rotations (by 15k degrees for k uniformly sampled from [24]). All datasets were standardized.
Dataset Splits	Yes	Let S1 = {(x1 i , y1 i )}m i=1 and S2 = {(x2 i , y2 i )}m i=1 be two balanced datasets. We consider them as two splits of the training dataset S, with the classifier h W0 S1 representing the model selected by the learning algorithm using S1, and S2 being used to assess its performance. ... To estimate the bound in equation 6. ... we generate k1 = 5 i.i.d. disjoint splits (Si 1, Si 2) of the training data S. For each one of these pairs, we generate k2 = 3 corrupted labeling Y ij 2 . We denote by Sij 2 the set obtained by replacing the labels of Si 2 with Y ij 2 and Sij 3 := Si 1 Sij 2 .
Hardware Specification	Yes	Throughout the experiments, we used Tesla-k80 GPUs for several hundred runs. Each run took between 5-20 hours.
Software Dependencies	No	The paper mentions using Stochastic Gradient Descent (SGD) for optimization but does not specify any software libraries, frameworks, or their version numbers (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Training process. We consider k-class classification problems (e.g., CIFAR10) and train multilayered neural networks h = e f L = e g L g1 : Rn RC on the corresponding training dataset S. The models are trained with SGD for cross-entropy loss minimization between its logits and the one-hot encodings of the labels. We consistently use batch size 128, learning rate schedule with an initial learning rate 0.1, decayed three times by a factor of 0.1 at epochs 60, 120, and 160, momentum 0.9 and weight decay 5e 4. Each model is trained for 500 epochs.