reproducibilityindex.ai

The Implicit Bias of Gradient Descent on Separable Data

Authors: Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson, Nathan Srebro

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Figure 1: Visualization of or main results on a synthetic dataset in which the L2 max margin vector ˆw is precisely known. (A) The dataset... Figure 3: Training of a convolutional neural network on CIFAR10 using stochastic gradient descent with constant learning rate and momentum, softmax output and a cross entropy loss, where we achieve 8.3% final validation error. Table 1: Sample values from various epochs in the experiment depicted in Fig. 3.
Researcher Affiliation	Academia	Daniel Soudry, Elad Hoffer, Mor Shpigel Nacson Department of Electrical Engineering,Technion Haifa, 320003, Israel... Nathan Srebro Toyota Technological Institute at Chicago Chicago, Illinois 60637, USA
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code available here: https://github.com/paper-submissions/Max_Margin
Open Datasets	Yes	Figure 3: Training of a convolutional neural network on CIFAR10 using stochastic gradient descent with constant learning rate and momentum, softmax output and a cross entropy loss, where we achieve 8.3% final validation error.
Dataset Splits	Yes	The increase in the test loss is practically important because the loss on a validation set is frequently used to monitor progress and decide on stopping. Similar to the population loss, the validation loss Lval (w (t)) = P x V ℓ w (t) x calculated on an independent validation set V, will increase logarithmically with t (since we would not expect zero validation error)...
Hardware Specification	No	The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments.
Software Dependencies	No	The paper mentions optimizers like ADAM and Ada Grad, and implicitly uses frameworks like PyTorch (from the code link), but it does not provide specific version numbers for any software dependencies.
Experiment Setup	Yes	Implementation details: The dataset includes four support vectors... We used a learning rate η = 1/σmax (X), where σmax (X) is the maximal singular value of X, momentum γ = 0.9 for GDMO, and initialized at the origin.