Oblivious Sketching for Logistic Regression

Authors: Alexander Munteanu, Simon Omlor, David Woodruff

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality. and 6. Experiments Our results can be reproduced with our open Python implementation available at https://github.com/cxan96/oblivious-sketching-logreg. We compare our oblivious Log Reg-sketch algorithm with uniform sampling (UNI), stochastic gradient descent (SGD), and the ℓ2-leverage score (L2S) coreset from (Munteanu et al., 2018).
Researcher Affiliation Academia 1Dortmund Data Science Center, Faculties of Statistics and Computer Science, TU Dortmund University, Dortmund, Germany 2Faculty of Statistics, TU Dortmund University, Dortmund, Germany 3Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
Pseudocode No The paper describes the sketching algorithm in Section 3.1 but does not present it as a formally structured pseudocode or algorithm block.
Open Source Code Yes Our results can be reproduced with our open Python implementation available at https://github.com/cxan96/oblivious-sketching-logreg.
Open Datasets Yes The covertype and kddcup data sets are loaded automatically by our code from the scikit library, and webspam data is loaded from the LIBSVM data repository2. Additional details on the size and dimensions of the data sets are in the supplementary, Section E. Footnote 2: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Dataset Splits No The paper mentions using well-known datasets like 'covertype', 'kddcup', and 'webspam' which often come with standard splits. It also states 'We repeated each experiment twenty times and displayed the median among all repetitions'. However, it does not explicitly specify the training, validation, or testing dataset splits (e.g., percentages, sample counts, or cross-validation folds) used for reproducing the experiments.
Hardware Specification No The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its experiments.
Software Dependencies No The paper mentions 'open Python implementation' and 'standard optimizers from the scikit learn library1' but does not provide specific version numbers for Python, scikit-learn, or any other software dependencies needed to replicate the experiment.
Experiment Setup Yes The Log Reg-sketch uses hmax + 1 = 3 levels and one level of uniform sampling. By the Ky Fan argument all but the largest 25% entries are cut off at each level. The other algorithms were run using their standard parameters.