reproducibilityindex.ai

Gradient Descent Maximizes the Margin of Homogeneous Neural Networks

Authors: Kaifeng Lyu, Jian Li

ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct several experiments to justify our theoretical ﬁnding on MNIST and CIFAR-10 datasets. (Abstract) and Experiments.2 The main practical implication of our theoretical result is that training longer can enlarge the normalized margin. To justify this claim empiricaly, we train CNNs on MNIST and CIFAR-10 with SGD (see Section K.1).
Researcher Affiliation	Academia	Kaifeng Lyu & Jian Li Institute for Interdisciplinary Information Sciences Tsinghua University Beijing, China vfleaking@gmail.com,lijian83@mail.tsinghua.edu.cn
Pseudocode	No	The paper describes procedures in paragraph form (e.g., in Appendix L.1 for loss-based learning rate scheduling) but does not contain explicitly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code available: https://github.com/vfleaking/max-margin
Open Datasets	Yes	We conduct several experiments to justify our theoretical ﬁnding on MNIST and CIFAR-10 datasets.
Dataset Splits	No	The paper mentions training on MNIST and CIFAR-10 datasets and evaluating test accuracy, but it does not explicitly provide details on validation dataset splits (e.g., percentages, sample counts, or specific methodology).
Hardware Specification	No	The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	We trained two models with Tensorﬂow. (Section K). The paper mentions TensorFlow but does not provide a specific version number, nor does it list other software components with version numbers.
Experiment Setup	Yes	In training the models, we use SGD with batch size 100 without momentum. We initialize all layer weights by He normal initializer (He et al., 2015) and all bias terms by zero. (Section K). In all our experiments, we set α(0) := 0.1, ru := 21/5 1.149, rd := 21/10 1.072. (Appendix L.1).