DECOrrelated feature space partitioning for distributed sparse regression
Authors: Xiangyu Wang, David B. Dunson, Chenlei Leng
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present the empirical performance of DECO via extensive numerical experiments. In particular, we compare DECO after 2 stage fitting (DECO-2) and DECO after 3 stage fitting (DECO-3) with the full data lasso (lasso-full), the full data lasso with ridge refinement (lasso-refine) and lasso with a naive feature partition without decorrelation (lasso-naive). |
| Researcher Affiliation | Academia | Xiangyu Wang Dept. of Statistical Science Duke University wwrechard@gmail.com David Dunson Dept. of Statistical Science Duke University dunson@stat.duke.edu Chenlei Leng Dept. of Statistics University of Warwick C.Leng@warwick.ac.uk |
| Pseudocode | Yes | Algorithm 1 The DECO framework |
| Open Source Code | No | The paper does not state that the code for the DECO methodology is open-source or provide a link to a code repository for their method. |
| Open Datasets | Yes | Student performance dataset. We look at one of the two datasets used for evaluating student achievement in two Portuguese schools [20]. Mammalian eye diseases. This dataset, taken from [21], was collected to study mammalian eye diseases... Electricity load diagram. This dataset [22] consists of electricity load from 2011 2014 for 370 clients. |
| Dataset Splits | Yes | The ridge parameter r2 is chosen by 5-fold cross-validation for both DECO-3 and lasso-refine. To compare the performance of all methods, we use 10-fold cross validation and record the prediction error (mean square error, MSE), model size and runtime. Because cross-validation is computationally demanding for such a large dataset, we put the first 200 clients in the training set and the remaining 170 clients in the testing set. |
| Hardware Specification | Yes | All the algorithms are coded and timed in Matlab on computers with Intel i7-3770k cores. |
| Software Dependencies | No | The paper mentions that algorithms are "coded and timed in Matlab" and uses "glmnet" and "extended BIC criterion", but it does not provide specific version numbers for any of these software components. |
| Experiment Setup | Yes | The variance σ2 is chosen such that ˆR2 = var(Xβ)/var(Y ) = 0.9. We use glmnet [18] to fit lasso and choose the tuning parameter using the extended BIC criterion [19] with γ fixed at 0.5. For DECO, the features are partitioned randomly in Stage 1 and the tuning parameter r1 is fixed at 1 for DECO-3. Since DECO-2 does not involve any refinement step, we choose r1 to be 10 to aid robustness. The ridge parameter r2 is chosen by 5-fold cross-validation for both DECO-3 and lasso-refine. The model dimension and the sample size are fixed at p = 10, 000 and n = 500 respectively and the number of subsets is fixed as m = 100. |