Oblivious Sketching for Logistic Regression
Authors: Alexander Munteanu, Simon Omlor, David Woodruff
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality. and 6. Experiments Our results can be reproduced with our open Python implementation available at https://github.com/cxan96/oblivious-sketching-logreg. We compare our oblivious Log Reg-sketch algorithm with uniform sampling (UNI), stochastic gradient descent (SGD), and the ℓ2-leverage score (L2S) coreset from (Munteanu et al., 2018). |
| Researcher Affiliation | Academia | 1Dortmund Data Science Center, Faculties of Statistics and Computer Science, TU Dortmund University, Dortmund, Germany 2Faculty of Statistics, TU Dortmund University, Dortmund, Germany 3Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA. |
| Pseudocode | No | The paper describes the sketching algorithm in Section 3.1 but does not present it as a formally structured pseudocode or algorithm block. |
| Open Source Code | Yes | Our results can be reproduced with our open Python implementation available at https://github.com/cxan96/oblivious-sketching-logreg. |
| Open Datasets | Yes | The covertype and kddcup data sets are loaded automatically by our code from the scikit library, and webspam data is loaded from the LIBSVM data repository2. Additional details on the size and dimensions of the data sets are in the supplementary, Section E. Footnote 2: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/ |
| Dataset Splits | No | The paper mentions using well-known datasets like 'covertype', 'kddcup', and 'webspam' which often come with standard splits. It also states 'We repeated each experiment twenty times and displayed the median among all repetitions'. However, it does not explicitly specify the training, validation, or testing dataset splits (e.g., percentages, sample counts, or cross-validation folds) used for reproducing the experiments. |
| Hardware Specification | No | The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its experiments. |
| Software Dependencies | No | The paper mentions 'open Python implementation' and 'standard optimizers from the scikit learn library1' but does not provide specific version numbers for Python, scikit-learn, or any other software dependencies needed to replicate the experiment. |
| Experiment Setup | Yes | The Log Reg-sketch uses hmax + 1 = 3 levels and one level of uniform sampling. By the Ky Fan argument all but the largest 25% entries are cut off at each level. The other algorithms were run using their standard parameters. |