Distributed High-Dimensional Quantile Regression: Estimation Efficiency and Support Recovery
Authors: Caixing Wang, Ziliang Shen
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on synthetic examples and a real data application further demonstrate the effectiveness of the proposed method. (Abstract) and Numerical verification. Another contribution of this work is the comprehensive studies on the validity and effectiveness of the proposed algorithm in various synthetic and real-life examples, which further support the theoretical findings in this paper. (Introduction) |
| Researcher Affiliation | Academia | Caixing Wang 1 Ziliang Shen 1 1School of Statistics and Management, Shanghai University of Finance and Economics. Correspondence to: Caixing Wang <wang.caixing@stu.sufe.edu.cn>. |
| Pseudocode | Yes | Algorithm 1 Distributed high-dimensional sparse quantile regression (DHSQR). |
| Open Source Code | Yes | The R code to reproduce our experimental results is available in https://github.com/Wang-Caixing-96/DHSQR. |
| Open Datasets | Yes | The R code to reproduce our experimental results is available in https://github.com/Wang-Caixing-96/DHSQR. (Footnote on page 5); drug sensitivity data of the Human Immunodeficiency Virus (HIV) (Rhee et al., 2003; Hu et al., 2021). This data is sourced from the Stanford University HIV Drug Resistance Database (http://hivdb.stanford.edu). (Appendix C.1) |
| Dataset Splits | Yes | For the global and local bandwidth, we set h = 5(s log N/n)1/3 and b = 0.53(s log n/n)1/3 , respectively, according to the theoretical results in Theorem 3.7 and 3.8. The regularization parameters λN,g are selected by validation. Specifically, we choose C0 to minimize the check loss on the validation set. All the simulation results are the average of 100 independent experiments. (Section 4); we randomly selected a training dataset with a sample size of Ntr = 1500, a validation dataset with a sample size of Nva = 300 to select the optimal penalty parameter λ, and the remaining data served as the test dataset, which had a sample size of Nte = 246. (Appendix C.1) |
| Hardware Specification | No | The paper states 'We further study the computation efficiency of our proposed estimator. We fix the local sample size n = 500 and vary the total sample size N.' but does not specify any hardware components like CPU, GPU, or memory used for the computations. |
| Software Dependencies | No | The paper mentions 'R package quantreg (Koenker, 2005) or conquer (He et al., 2023)' but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For the global and local bandwidth, we set h = 5(s log N/n)1/3 and b = 0.53(s log n/n)1/3 , respectively, according to the theoretical results in Theorem 3.7 and 3.8. The regularization parameters λN,g are selected by validation. Specifically, we choose C0 to minimize the check loss on the validation set. All the simulation results are the average of 100 independent experiments. (Section 4). For the rest of the experiments in this section, we fix the number of iterations T = 10. (Section 4.1) |