reproducibilityindex.ai

Oblivious Sketching for Logistic Regression

Authors: Alexander Munteanu, Simon Omlor, David Woodruff

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our sketches are fast, simple, easy to implement, and our experiments demonstrate their practicality. and 6. Experiments Our results can be reproduced with our open Python implementation available at https://github.com/cxan96/oblivious-sketching-logreg. We compare our oblivious Log Reg-sketch algorithm with uniform sampling (UNI), stochastic gradient descent (SGD), and the ℓ2-leverage score (L2S) coreset from (Munteanu et al., 2018).
Researcher Affiliation	Academia	1Dortmund Data Science Center, Faculties of Statistics and Computer Science, TU Dortmund University, Dortmund, Germany 2Faculty of Statistics, TU Dortmund University, Dortmund, Germany 3Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA.
Pseudocode	No	The paper describes the sketching algorithm in Section 3.1 but does not present it as a formally structured pseudocode or algorithm block.
Open Source Code	Yes	Our results can be reproduced with our open Python implementation available at https://github.com/cxan96/oblivious-sketching-logreg.
Open Datasets	Yes	The covertype and kddcup data sets are loaded automatically by our code from the scikit library, and webspam data is loaded from the LIBSVM data repository2. Additional details on the size and dimensions of the data sets are in the supplementary, Section E. Footnote 2: https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/
Dataset Splits	No	The paper mentions using well-known datasets like 'covertype', 'kddcup', and 'webspam' which often come with standard splits. It also states 'We repeated each experiment twenty times and displayed the median among all repetitions'. However, it does not explicitly specify the training, validation, or testing dataset splits (e.g., percentages, sample counts, or cross-validation folds) used for reproducing the experiments.
Hardware Specification	No	The paper does not explicitly describe the hardware (e.g., specific GPU/CPU models, memory) used to run its experiments.
Software Dependencies	No	The paper mentions 'open Python implementation' and 'standard optimizers from the scikit learn library1' but does not provide specific version numbers for Python, scikit-learn, or any other software dependencies needed to replicate the experiment.
Experiment Setup	Yes	The Log Reg-sketch uses hmax + 1 = 3 levels and one level of uniform sampling. By the Ky Fan argument all but the largest 25% entries are cut off at each level. The other algorithms were run using their standard parameters.