Stochastic Gradient and Langevin Processes
Authors: Xiang Cheng, Dong Yin, Peter Bartlett, Michael Jordan
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our theoretical findings to studying the convergence of Stochastic Gradient Descent (SGD) for non-convex problems and corroborate them with experiments using SGD to train deep neural networks on the CIFAR-10 dataset. |
| Researcher Affiliation | Academia | 1Department of Electrical Engineering and Computer Science, University of California, Berkeley. Correspondence to: Xiang Cheng <x.cheng@berkeley.edu>. |
| Pseudocode | No | The paper describes algorithms and mathematical processes in text and equations, but does not include formal pseudocode blocks or algorithms labeled as such. |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository for the methodology described. |
| Open Datasets | Yes | In all experiments, we use two different neural network architectures on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with the standard test-train split. |
| Dataset Splits | Yes | In all experiments, we use two different neural network architectures on the CIFAR-10 dataset (Krizhevsky & Hinton, 2009) with the standard test-train split. |
| Hardware Specification | No | The paper does not specify any particular hardware details such as GPU models, CPU types, or memory used for the experiments. It only mentions using 'deep neural networks'. |
| Software Dependencies | No | The paper does not specify any software names with version numbers that would be necessary to reproduce the experiments. |
| Experiment Setup | Yes | In all of our experiments, we run SGD algorithm 2000 epochs such that the algorithm converges sufficiently. ... We choose constant step size δ from {0.001, 0.002, 0.004, 0.008, 0.016, 0.032, 0.064, 0.128} and minibatch size b from {32, 64, 128, 256, 512}. ... we do not use batch normalization or dropout, and use constant step size. |