Scaled Least Squares Estimator for GLMs in Large-Scale Problems

Authors: Murat A. Erdogdu, Lee H. Dicker, Mohsen Bayati

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Finally, we demonstrate the performance of our algorithm through extensive numerical studies on large-scale real and synthetic datasets, and show that it achieves the highest performance compared to several other widely used optimization algorithms.
Researcher Affiliation Collaboration Murat A. Erdogdu Department of Statistics Stanford University erdogdu@stanford.edu Mohsen Bayati Graduate School of Business Stanford University bayati@stanford.edu Lee H. Dicker Department of Statistics and Biostatistics Rutgers University and Amazon ldicker@stat.rutgers.edu
Pseudocode Yes Algorithm 1 SLS: Scaled Least Squares Estimator Input: Data (yi, xi)n i=1 Step 1. Compute the least squares estimator: ˆβols and ˆy = Xˆβols. For a sub-sampling based OLS estimator, let S [n] be a random subset and take ˆβols = |S| SXS) 1XT y. Step 2. Solve the following equation for c 2 R: 1 = c i=1 (2)(c ˆyi). Use Newton s root-finding method: Initialize c = 2/Var (yi); Repeat until convergence: i=1 (2)(c ˆyi) 1 (2)(c ˆyi) + c (3)(c ˆyi) Output: ˆβ sls = c ˆβols.
Open Source Code No The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include any links to a code repository.
Open Datasets Yes The datasets we analyzed were: (i) a synthetic dataset generated from a logistic regression model with iid {exponential(1) 1} predictors scaled by (1); (ii) the Higgs dataset (logistic regression) [BSW14]; (iii) a synthetic dataset generated from a Poisson regression model with iid binary( 1) predictors scaled by (2); (iv) the Covertype dataset (Poisson regression) [BD99].
Dataset Splits No The test error is measured as the mean squared error of the estimated mean using the current parameters at each iteration on a test dataset, which is a randomly selected (and set-aside) 10% portion of the entire dataset. The paper explicitly mentions a 10% test split, which implies a 90% training split, but does not explicitly specify a separate validation split or its size/methodology.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only refers to 'large-scale problems' without hardware specifications.
Software Dependencies No The paper mentions using 'R s built-in functions' and various optimization algorithms like Newton-Raphson, BFGS, LBFGS, GD, AGD, and Newton-Stein. However, it does not provide specific version numbers for any of these software components, libraries, or programming languages.
Experiment Setup Yes For all the algorithms, the step size at each iteration is chosen via the backtracking line search [BV04]. And: We consider two scenarios in our experiments: first, we use the OLS estimator computed for Algorithm 1 to initialize the MLE algorithms; second, we use a random initial value.