Least Squares Revisited: Scalable Approaches for Multi-class Prediction
Authors: Alekh Agarwal, Sham Kakade, Nikos Karampatziakis, Le Song, Gregory Valiant
ICML 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On the empirical side, we show how to scale our approach to high dimensional datasets, achieving dramatic computational speedups over popular optimization packages such as Liblinear and Vowpal Wabbit on standard datasets (MNIST and CIFAR-10), while attaining state-of-the-art accuracies. |
| Researcher Affiliation | Collaboration | Alekh Agarwal ALEKHA@MICROSOFT.COM Microsoft Research New York, NY Sham M. Kakade SKAKADE@MICROSOFT.COM Microsoft Research Cambridge, MA Nikos Karampatziakis NIKOSK@MICROSOFT.COM Microsoft Cloud and Information Services Lab, Redmond, WA Le Song LSONG@CC.GATECH.EDU College of Computing, Georgia Tech, Atlanta, Georgia Gregory Valiant VALIANT@STANFORD.EDU Computer Science Department, Stanford University, CA |
| Pseudocode | Yes | Algorithm 1 Generalized Least Squares Input: Initial weight matrix W0, data {(xi, yi)}, Lipschitz constant L, link g = Φ. Define the (vector valued) predictions ˆy(t) i = g(Wtxi) and the empirical expectations: bΣ = ˆE[xix T i ] = 1 n Pn i=1 xix T i ˆE[(ˆy(t) y)x T ] = 1 n Pn i=1 (ˆy(t) i yi)x T i Update the weight matrix Wt: W T t+1 = W T t 1 L bΣ 1 ˆE[(ˆy(t) y)x T ] (6) until convergence |
| Open Source Code | No | The paper does not include an unambiguous statement that the authors are releasing their source code for the methodology described, nor does it provide a direct link to a code repository for their work. |
| Open Datasets | Yes | We consider four datasets MNIST, CIFAR-10, 20 Newsgroups, and RCV1 that capture many of the challenges encountered in real-world learning tasks. |
| Dataset Splits | No | The paper mentions using datasets like MNIST and CIFAR-10, but it does not specify exact training, validation, or test dataset splits with percentages, absolute sample counts, or references to predefined splits needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, memory amounts, or detailed computer specifications used for running its experiments. It only mentions general computational aspects like 'computational speedups' and 'simple MATLAB implementation'. |
| Software Dependencies | No | The paper mentions software like 'Liblinear', 'Vowpal Wabbit', and 'MATLAB implementation' but does not provide specific version numbers for these or any other ancillary software components needed to replicate the experiments. |
| Experiment Setup | Yes | Concretely, we fit blocks of features (either 512 and 1024) with Algorithm 3 with three alternative update rules on each stage: linear regression, Calibration, and Logistic (50 inner loop iterations). Our calibrated variant again uses the functions y, y2 and y3 of previous predictions as additional features in our new batch of features. |