reproducibilityindex.ai

The Hessian Screening Rule

Authors: Johan Larsson, Jonas Wallin

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Section 4, we present numerical experiments on simulated and real data to showcase the effectiveness of the screening rule, demonstrating that the rule is effective both when p n and n p, out-performing the other alternatives that we study.
Researcher Affiliation	Academia	Johan Larsson Department of Statistics Lund University johan.larsson@stat.lu.se Jonas Wallin Department of Statistics Lund University jonas.wallin@stat.lu.se
Pseudocode	Yes	We outline this technique in Algorithm 1 (Appendix B). [...] The Hessian screening method is presented in full in Algorithm 2 (Appendix B).
Open Source Code	Yes	The source code, including a Singularity [25] container and its recipe for reproducing the results, are available at https://github.com/jolars/Hessian Screening.
Open Datasets	Yes	In this section, we conduct experiments on real data sets. We run 20 iterations for the smaller data sets studied and three for the larger ones. For information on the sources of these data sets, please see Appendix E. [...] For instance, Year Prediction MSD [38], colon-cancer [39], arcene [40], duke-breast-cancer [41], ijcnn1 [42], madelon [43], and news20 [44] are all available from the LIBSVM library [45] (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/). The rcv1 data set [46] is available from the LIBSVM website (https://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multilabel.html). The e2006-log1p and e2006-tﬁdf data sets [47] were downloaded from http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/regression.html. The tcga data set was obtained from TCGA: The Cancer Genome Atlas (https://www.cancer.gov/tcga).
Dataset Splits	No	The paper mentions 'model tuning, such as cross-validation' and describes training parameters (e.g., stopping criteria, convergence threshold) but does not explicitly provide percentages or counts for training/validation/test dataset splits within the main text.
Hardware Specification	No	The computations were enabled by resources provided by LUNARC. The paper names a computing resource but does not specify details such as CPU/GPU models, memory, or specific machine configurations.
Software Dependencies	No	The code used in these experiments was, for every method, programmed in C++ using the Armadillo library [21, 22] and organized as an R package via Rcpp [23]. We used the renv package [24] to maintain dependencies. The paper lists software and libraries but does not provide specific version numbers for them.
Experiment Setup	Yes	To construct the regularization path, we adopt the default settings from glmnet: we use a log-spaced path of 100 λ values from λmax to ξλmax, where ξ = 10 2 if p > n and 10 4 otherwise. We stop the path whenever the deviance ratio, 1 dev/devnull, reaches 0.999 or the fractional decrease in deviance is less than 10 5. Finally, we also stop the path whenever the number of coefﬁcients ever to be active predictors exceeds p. [...] We use cyclical coordinate descent with shufﬂing and consider the model to converge when the duality gap G(β, θ) εζ, where we take ζ to be y 2 2 when ﬁtting the ordinary lasso, and n log 2 when ﬁtting ℓ1-regularized logistic regression. Unless speciﬁed, we let ε = 10 4. These settings are standard settings and, for instance, resemble the defaults used in Celer. For all of the experiments, we employ the line search algorithm used in Blitz4.