Fast and Robust Online Inference with Stochastic Gradient Descent via Random Scaling
Authors: Sokbae Lee, Yuan Liao, Myung Hwan Seo, Youngki Shin7381-7389
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we investigate the numerical performance of the random scaling method via Monte Carlo experiments. We consider two baseline models: linear regression and logistic regression. |
| Researcher Affiliation | Academia | 1Department of Economics, Columbia University, New York, NY 10027, USA 2Department of Economics, Rutgers University, New Brunswick, NJ 08901, USA 3Department of Economics, Seoul National University, Seoul, 08826, Korea 4Department of Economics, McMaster University, Hamilton, ON L8S 4L8, Canada |
| Pseudocode | Yes | Algorithm 1: Online Inference with SGD via Random Scaling |
| Open Source Code | Yes | The replication code is available at https://github. com/SGDinference-Lab/AAAI-22. |
| Open Datasets | No | The data for the linear regression are generated from yt = x tβ + εt for t = 1, . . . , n, where xt is a d-dimensional vector of covariates generated from the multivariate normal distribution N(0, Id), εt is from N(0, 1), and β is equi-spaced on the interval [0, 1]. |
| Dataset Splits | No | The paper does not provide specific details on training, validation, or test dataset splits, percentages, or counts. |
| Hardware Specification | Yes | We use the Compute Canada Graham cluster composed of Intel CPUs (Broadwell, Skylake, and Cascade Lake at 2.1GHz 2.5GHz) and they are assigned with 3GB memory. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., programming languages, libraries, or solvers with their explicit versions). |
| Experiment Setup | Yes | We consider different combination of the learning rate γt = γ0t a by setting γ0 = 0.5, 1 and a = 0.505, 0.667. The sample size set to be n = 100000. The initial value β0 is set to be zero. In case of d = 20, we burn in around 1% of observations and start to estimate βt from t = 1000. |