Variance Reduction in Stochastic Gradient Langevin Dynamics

Authors: Kumar Avinava Dubey, Sashank J. Reddi, Sinead A. Williamson, Barnabas Poczos, Alexander J. Smola, Eric P. Xing

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This is complemented by impressive empirical results obtained on a variety of real world datasets, and on four different machine learning tasks (regression, classification, independent component analysis and mixture modeling).
Researcher Affiliation Academia Avinava Dubey , Sashank J. Reddi , Barnab as P oczos, Alexander J. Smola, Eric P. Xing Department of Machine Learning Carnegie-Mellon University Pittsburgh, PA 15213 {akdubey, sjakkamr, bapoczos, alex, epxing}@cs.cmu.edu Sinead A. Williamson IROM/Statistics and Data Science University of Texas at Austin Austin, TX 78712 sinead.williamson@mccombs.utexas.edu
Pseudocode Yes Algorithm 1: SAGA-LD
Open Source Code No The paper does not provide concrete access to source code for the methodology described.
Open Datasets Yes We ran experiments on 11 standard UCI regression datasets, summarized in Table 1. The datasets can be downloaded from https://archive.ics.uci.edu/ml/index.html. We used a standard ICA dataset for our experiment3...3The dataset can be downloaded from https://www.cis.hut.fi/projects/ica/eegmeg/ MEG_data.html.
Dataset Splits Yes In each case, we set the prior precision λ = 1, and we partitioned our dataset into training (70%), validation (10%), and test (20%) sets. The validation set is used to select the step size parameters, and we report the mean square error (MSE) evaluated on the test set, using 5-fold cross-validation.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU/GPU models, memory) used for running its experiments.
Software Dependencies No The paper does not provide specific ancillary software details with version numbers (e.g., library or solver names with version numbers).
Experiment Setup Yes In all our experiments, we use a decreasing step size for SGLD as suggested by [15]. In particular, we use ϵt = a(b + t) γ, where the parameters a, b and γ are chosen for each dataset to give the best performance of the algorithm on that particular dataset. For SAGA-LD, due to the benefit of variance reduction, we use a simple two phase constant step size selection strategy. The minibatch size, n, in both SGLD and SAGA-LD is held at a constant value of 10 throughout our experiments. All algorithms are initialized to the same point and the same sequence of minibatches is pre-generated and used in both algorithms.