reproducibilityindex.ai

The Strong Screening Rule for SLOPE

Authors: Johan Larsson, Malgorzata Bogdan, Jonas Wallin

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our numerical experiments show that the rule performs well in practice, leading to improvements by orders of magnitude for data in the p n domain, as well as incurring no additional computational overhead when n > p. In this section we present simulations that examine the effects of applying the screening rules.
Researcher Affiliation	Academia	Johan Larsson Dept. of Statistics, Lund University johan.larsson@stat.lu.se Małgorzata Bogdan Dept. of Mathematics, University of Wroclaw Dept. of Statistics, Lund University malgorzata.bogdan@uwr.edu.pl Jonas Wallin Dept. of Statistics, Lund University jonas.wallin@stat.lu.se
Pseudocode	Yes	Algorithm 1 Require: c Rp, λ Rp, where λ1 λp 0. 1: S, B 2: for i 1, . . . , p do 3: B B {i} 4: if P j B cj λj 0 then 5: S S B 6: B 7: end if 8: end for 9: return S
Open Source Code	Yes	At the time of this publication, an efﬁcient implementation of the screening rule is available in the R package SLOPE [28].
Open Datasets	Yes	The ﬁrst three originate from Guyon et al. [31] and were originally collected from the UCI (University of California Irvine) Machine Learning Repository [32], whereas the last data set, golub, was originally published in Golub et al. [33]. All of the data sets were collected from http: //statweb.stanford.edu/~tibs/strong/realdata/. e2006-tﬁdf was collected from Frandi [34], news20 from https://www.csie.ntu.edu.tw/ ~cjlin/libsvmtools/datasets [37], and physician from https://www.jstatsoft.org/ article/view/v027i08 [38].
Dataset Splits	No	The paper mentions "cross-validation" in a general context as a common method for choosing the lambda sequence, but it does not specify the training/validation/test splits used in its own experiments for reproducibility. For instance, it doesn't state specific percentages or counts for validation sets.
Hardware Specification	No	All simulations were run on a dedicated high-performance computing cluster. This statement is too general and does not provide specific hardware details (e.g., CPU/GPU models, memory).
Software Dependencies	Yes	Throughout the paper we use version 0.2.1 of the R package SLOPE [28], which uses the accelerated proximal gradient algorithm FISTA [29] to estimate all models
Experiment Setup	Yes	Unless stated otherwise, we normalize the predictors such that xj = 0 and xj 2 = 1 for j = 1, . . . , p. In addition, we center the response vector such that y = 0 when f(β) is the least squares objective. We use the Benjamini Hochberg (BH) method [3] for computing the λ sequence, which sets λBH i = Φ 1 1 qi/(2p) for i = 1, 2, . . . , p, where Φ 1 is the probit function. We choose σ(l) to be tσ(1) with t = 10 2 if n < p and 10 4 otherwise. Unless stated otherwise, we employ a regularization path of l = 100 λ sequences but stop this path prematurely if 1) the number of unique coefﬁcient magnitudes exceed the number of observations, 2) the fractional change in deviance from one step to another is less than 10 5, or 3) if the fraction of deviance explained exceeds 0.995. convergence is obtained when the duality gap as a fraction of the primal and the relative level of infeasibility [30] are lower than 10 5 and 10 3 respectively.