Low-Precision Stochastic Gradient Langevin Dynamics
Authors: Ruqi Zhang, Andrew Gordon Wilson, Christopher De Sa
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the generalization accuracy and uncertainty estimation of low-precision SGLD with full-precision gradient accumulators (SGLDLP-F) and with variance-corrected low-precision gradient accumulators (VC SGLDLP-L) on a logistic regression and multilayer perceptron on MNIST dataset (Section 5.1), Res Net-18 on CIFAR datasets and LSTM on IMDB dataset (Section 5.2), and Res Net-18 on Image Net dataset (Section 5.3). |
| Researcher Affiliation | Academia | 1The University of Texas at Austin 2New York University 3Cornell University. |
| Pseudocode | Yes | Algorithm 1 Variance-Corrected Low-Precision SGLD (VC SGLDLP-L). |
| Open Source Code | No | Our code is available here. - The word "here" is present, but no actual link is provided in the document for the code. |
| Open Datasets | Yes | on a logistic regression and multilayer perceptron on MNIST dataset (Section 5.1), Res Net-18 on CIFAR datasets and LSTM on IMDB dataset (Section 5.2), and Res Net-18 on Image Net dataset (Section 5.3). |
| Dataset Splits | No | The paper mentions training and testing but does not explicitly provide details about validation dataset splits (e.g., percentages, sample counts, or a dedicated validation set description). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | We use qtorch (Zhang et al., 2019) to simulate low-precision training on these experiments. No specific version numbers for software libraries or dependencies are provided. |
| Experiment Setup | Yes | For CIFAR datasets, we use batch size 128, learning rate 0.5 and weight decay 5e 4. We train the model for 245 epochs... and For IMDB dataset, we use batch size 80, learning rate 0.3 and weight decay 5e 4. We use a two-layer LSTM. ... We train the model for 50 epochs... |