Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scaled Least Squares Estimator for GLMs in Large-Scale Problems
Authors: Murat A. Erdogdu, Lee H. Dicker, Mohsen Bayati
NeurIPS 2016 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we demonstrate the performance of our algorithm through extensive numerical studies on large-scale real and synthetic datasets, and show that it achieves the highest performance compared to several other widely used optimization algorithms. |
| Researcher Affiliation | Collaboration | Murat A. Erdogdu Department of Statistics Stanford University EMAIL Mohsen Bayati Graduate School of Business Stanford University EMAIL Lee H. Dicker Department of Statistics and Biostatistics Rutgers University and Amazon EMAIL |
| Pseudocode | Yes | Algorithm 1 SLS: Scaled Least Squares Estimator Input: Data (yi, xi)n i=1 Step 1. Compute the least squares estimator: ˆβols and ˆy = Xˆβols. For a sub-sampling based OLS estimator, let S [n] be a random subset and take ˆβols = |S| SXS) 1XT y. Step 2. Solve the following equation for c 2 R: 1 = c i=1 (2)(c ˆyi). Use Newton s root-finding method: Initialize c = 2/Var (yi); Repeat until convergence: i=1 (2)(c ˆyi) 1 (2)(c ˆyi) + c (3)(c ˆyi) Output: ˆβ sls = c ˆβols. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code for the described methodology, nor does it include any links to a code repository. |
| Open Datasets | Yes | The datasets we analyzed were: (i) a synthetic dataset generated from a logistic regression model with iid {exponential(1) 1} predictors scaled by (1); (ii) the Higgs dataset (logistic regression) [BSW14]; (iii) a synthetic dataset generated from a Poisson regression model with iid binary( 1) predictors scaled by (2); (iv) the Covertype dataset (Poisson regression) [BD99]. |
| Dataset Splits | No | The test error is measured as the mean squared error of the estimated mean using the current parameters at each iteration on a test dataset, which is a randomly selected (and set-aside) 10% portion of the entire dataset. The paper explicitly mentions a 10% test split, which implies a 90% training split, but does not explicitly specify a separate validation split or its size/methodology. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. It only refers to 'large-scale problems' without hardware specifications. |
| Software Dependencies | No | The paper mentions using 'R s built-in functions' and various optimization algorithms like Newton-Raphson, BFGS, LBFGS, GD, AGD, and Newton-Stein. However, it does not provide specific version numbers for any of these software components, libraries, or programming languages. |
| Experiment Setup | Yes | For all the algorithms, the step size at each iteration is chosen via the backtracking line search [BV04]. And: We consider two scenarios in our experiments: first, we use the OLS estimator computed for Algorithm 1 to initialize the MLE algorithms; second, we use a random initial value. |