reproducibilityindex.ai

Fast Equilibrium of SGD in Generic Situations

Authors: Zhiyuan Li, Yi Wang, Zhiren Wang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Mixing on local manifold: The key technical observation of this paper (Proposition 4.2) is that the distribution of trajectories with an initial position is trapped locally in the attracting basin containing the initial position during any practical observation windows. Using the method from (Wang & Wang, 2022, Fig. 13), this observation is supported by the experiment below: using a reduced MNIST dataset with only 1280 samples and a small CNN with 1786 parameters (so that the model is still overparametrized), we ran 15 independent instances of SGD, at λ = η = 1 32, for each of two randomly chosen initial parametrizations.
Researcher Affiliation	Academia	Zhiyuan Li * Toyota Technological Institute at Chicago zhiyuanli@ttic.edu Yi Wang * Johns Hopkins University ywang261@jhu.edu Zhiren Wang Pennsylvania State University zhirenw@psu.edu
Pseudocode	No	The paper is highly theoretical and mathematical, focusing on SDEs and proofs, and does not contain any explicit pseudocode or algorithm blocks.
Open Source Code	No	The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories.
Open Datasets	Yes	using a reduced MNIST dataset with only 1280 samples and a small CNN with 1786 parameters (so that the model is still overparametrized), we ran 15 independent instances of SGD, at λ = η = 1 32, for each of two randomly chosen initial parametrizations. A smilar experiment was ran for reduced CIFAR10 dataset with 1280 samples, a CNN model with 2658 parameters, η = 1 1024, λ = 1 32 and 1.28 million SGD steps.
Dataset Splits	No	The paper describes experiments on 'reduced MNIST dataset' and 'reduced CIFAR10 dataset' and mentions 'training steps', but it does not specify explicit training/validation/test dataset splits or cross-validation methodology.
Hardware Specification	No	The paper describes the experimental setup including datasets, model sizes, and SGD parameters, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow with their versions, or other libraries).
Experiment Setup	Yes	we ran 15 independent instances of SGD, at λ = η = 1 32, for each of two randomly chosen initial parametrizations. Each instance lasts 0.8 million steps of SGD. A smilar experiment was ran for reduced CIFAR10 dataset with 1280 samples, a CNN model with 2658 parameters, η = 1 1024, λ = 1 32 and 1.28 million SGD steps.