Fast Equilibrium of SGD in Generic Situations
Authors: Zhiyuan Li, Yi Wang, Zhiren Wang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Mixing on local manifold: The key technical observation of this paper (Proposition 4.2) is that the distribution of trajectories with an initial position is trapped locally in the attracting basin containing the initial position during any practical observation windows. Using the method from (Wang & Wang, 2022, Fig. 13), this observation is supported by the experiment below: using a reduced MNIST dataset with only 1280 samples and a small CNN with 1786 parameters (so that the model is still overparametrized), we ran 15 independent instances of SGD, at λ = η = 1 32, for each of two randomly chosen initial parametrizations. |
| Researcher Affiliation | Academia | Zhiyuan Li * Toyota Technological Institute at Chicago zhiyuanli@ttic.edu Yi Wang * Johns Hopkins University ywang261@jhu.edu Zhiren Wang Pennsylvania State University zhirenw@psu.edu |
| Pseudocode | No | The paper is highly theoretical and mathematical, focusing on SDEs and proofs, and does not contain any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing source code for the described methodology, nor does it provide links to any code repositories. |
| Open Datasets | Yes | using a reduced MNIST dataset with only 1280 samples and a small CNN with 1786 parameters (so that the model is still overparametrized), we ran 15 independent instances of SGD, at λ = η = 1 32, for each of two randomly chosen initial parametrizations. A smilar experiment was ran for reduced CIFAR10 dataset with 1280 samples, a CNN model with 2658 parameters, η = 1 1024, λ = 1 32 and 1.28 million SGD steps. |
| Dataset Splits | No | The paper describes experiments on 'reduced MNIST dataset' and 'reduced CIFAR10 dataset' and mentions 'training steps', but it does not specify explicit training/validation/test dataset splits or cross-validation methodology. |
| Hardware Specification | No | The paper describes the experimental setup including datasets, model sizes, and SGD parameters, but it does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used to run the experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python version, specific deep learning frameworks like PyTorch or TensorFlow with their versions, or other libraries). |
| Experiment Setup | Yes | we ran 15 independent instances of SGD, at λ = η = 1 32, for each of two randomly chosen initial parametrizations. Each instance lasts 0.8 million steps of SGD. A smilar experiment was ran for reduced CIFAR10 dataset with 1280 samples, a CNN model with 2658 parameters, η = 1 1024, λ = 1 32 and 1.28 million SGD steps. |