Fluctuation-dissipation relations for stochastic gradient descent
Authors: Sho Yaida
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our claims are empirically verified. |
| Researcher Affiliation | Industry | Sho Yaida Facebook AI Research Facebook Inc. Menlo Park, California 94025, USA shoyaida@fb.com |
| Pseudocode | No | The paper describes algorithms and equations but does not include a clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | No | No explicit statement about releasing source code or a link to a code repository was found. |
| Open Datasets | Yes | a multilayer perceptron (MLP) learning patterns in the MNIST training data (Le Cun et al., 1998) through SGD without momentum and a convolutional neural network (CNN) learning patterns in the CIFAR10 training data (Krizhevsky & Hinton, 2009) |
| Dataset Splits | No | The paper mentions training and test data but does not explicitly describe a validation dataset split or a methodology for it. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models, processors, or memory specifications used for running experiments were provided. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | For both models, the mini-batch size is set to be |B| = 100, and the training data are shuffled at each epoch... the L2-regularization term 1/2λθ^2 with the weight decay λ = 0.01 is included in the loss function f. The MLP is initialized through the Xavier method (Glorot & Bengio, 2010) and trained for ˆttotal epoch = 100 epochs with the learning rate η = 0.1. |