Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling

Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin7381-7389

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section we investigate the numerical performance of the random scaling method via Monte Carlo experiments. We consider two baseline models: linear regression and logistic regression.
Researcher Affiliation Academia 1Department of Economics, Columbia University, New York, NY 10027, USA 2Department of Economics, Rutgers University, New Brunswick, NJ 08901, USA 3Department of Economics, Seoul National University, Seoul, 08826, Korea 4Department of Economics, McMaster University, Hamilton, ON L8S 4L8, Canada
Pseudocode Yes Algorithm 1: Online Inference with SGD via Random Scaling
Open Source Code Yes The replication code is available at https://github. com/SGDinference-Lab/AAAI-22.
Open Datasets No The data for the linear regression are generated from yt = x tβ + εt for t = 1, . . . , n, where xt is a d-dimensional vector of covariates generated from the multivariate normal distribution N(0, Id), εt is from N(0, 1), and β is equi-spaced on the interval [0, 1].
Dataset Splits No The paper does not provide specific details on training, validation, or test dataset splits, percentages, or counts.
Hardware Specification Yes We use the Compute Canada Graham cluster composed of Intel CPUs (Broadwell, Skylake, and Cascade Lake at 2.1GHz 2.5GHz) and they are assigned with 3GB memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their explicit versions).
Experiment Setup Yes We consider different combination of the learning rate γt = γ0t a by setting γ0 = 0.5, 1 and a = 0.505, 0.667. The sample size set to be n = 100000. The initial value β0 is set to be zero. In case of d = 20, we burn in around 1% of observations and start to estimate βt from t = 1000.