reproducibilityindex.ai

DECOrrelated feature space partitioning for distributed sparse regression

Authors: Xiangyu Wang, David B. Dunson, Chenlei Leng

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we present the empirical performance of DECO via extensive numerical experiments. In particular, we compare DECO after 2 stage ﬁtting (DECO-2) and DECO after 3 stage ﬁtting (DECO-3) with the full data lasso (lasso-full), the full data lasso with ridge reﬁnement (lasso-reﬁne) and lasso with a naive feature partition without decorrelation (lasso-naive).
Researcher Affiliation	Academia	Xiangyu Wang Dept. of Statistical Science Duke University wwrechard@gmail.com David Dunson Dept. of Statistical Science Duke University dunson@stat.duke.edu Chenlei Leng Dept. of Statistics University of Warwick C.Leng@warwick.ac.uk
Pseudocode	Yes	Algorithm 1 The DECO framework
Open Source Code	No	The paper does not state that the code for the DECO methodology is open-source or provide a link to a code repository for their method.
Open Datasets	Yes	Student performance dataset. We look at one of the two datasets used for evaluating student achievement in two Portuguese schools [20]. Mammalian eye diseases. This dataset, taken from [21], was collected to study mammalian eye diseases... Electricity load diagram. This dataset [22] consists of electricity load from 2011 2014 for 370 clients.
Dataset Splits	Yes	The ridge parameter r2 is chosen by 5-fold cross-validation for both DECO-3 and lasso-reﬁne. To compare the performance of all methods, we use 10-fold cross validation and record the prediction error (mean square error, MSE), model size and runtime. Because cross-validation is computationally demanding for such a large dataset, we put the ﬁrst 200 clients in the training set and the remaining 170 clients in the testing set.
Hardware Specification	Yes	All the algorithms are coded and timed in Matlab on computers with Intel i7-3770k cores.
Software Dependencies	No	The paper mentions that algorithms are "coded and timed in Matlab" and uses "glmnet" and "extended BIC criterion", but it does not provide specific version numbers for any of these software components.
Experiment Setup	Yes	The variance σ2 is chosen such that ˆR2 = var(Xβ)/var(Y ) = 0.9. We use glmnet [18] to ﬁt lasso and choose the tuning parameter using the extended BIC criterion [19] with γ ﬁxed at 0.5. For DECO, the features are partitioned randomly in Stage 1 and the tuning parameter r1 is ﬁxed at 1 for DECO-3. Since DECO-2 does not involve any reﬁnement step, we choose r1 to be 10 to aid robustness. The ridge parameter r2 is chosen by 5-fold cross-validation for both DECO-3 and lasso-reﬁne. The model dimension and the sample size are ﬁxed at p = 10, 000 and n = 500 respectively and the number of subsets is ﬁxed as m = 100.