Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
High-dimensional Varying Index Coefficient Models via Stein's Identity
Authors: Sen Na, Zhuoran Yang, Zhaoran Wang, Mladen Kolar
JMLR 2019 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, we conduct extensive numerical experiments to corroborate the theoretical results. |
| Researcher Affiliation | Academia | Sen Na EMAIL Department of Statistics University of Chicago Chicago, IL 60637, USA Zhuoran Yang EMAIL Department of Operations Research and Financial Engineering Princeton University Princeton, NJ 08544, USA Zhaoran Wang EMAIL Department of Industrial Engineering and Management Sciences Northwestern University Evanston, IL 60208, USA Mladen Kolar EMAIL The University of Chicago Booth School of Business Chicago, IL 60637, USA |
| Pseudocode | No | The paper describes mathematical models and estimation procedures using equations, but does not present any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is available for download at: https://github.com/senna1128/Varying-Index-Coefficient-Models. |
| Open Datasets | Yes | The data set was obtained from Dryad at https://datadryad.org/resource/doi:10.5061/dryad.1139fm7. |
| Dataset Splits | Yes | The data set has n 215 samples evaluated at two locations in total, where n1 119 of them are collected from the first population group with 45748 SNPs measured, while n2 96 of them are collected from the second population group with 59332 SNPs measured. There are 38106 SNPs in common and we select d2 250 from them uniformly at random. To make individuals independent from each other, in each group we only use the data evaluated at the first location for the first half of individuals and the data evaluated at the second location for the second half of individuals. |
| Hardware Specification | No | This work was completed in part with resources provided by the University of Chicago Research Computing Center. |
| Software Dependencies | No | We use default settings in CVX package (Grant and Boyd, 2008, 2012) to solve (A.1) efficiently. |
| Experiment Setup | Yes | According to Theorem 3 and 7, we set λk 30 a log d1d2{n and τ 2pn{ log d1d2q1{6. According to Theorem 9, we logpd1 d2q{npd1 d2q and λ 12 a pd1 d2q logpd1 d2q{n. The precision matrix estimator we use is defined in (14) with κ2 2 a log d2{nd2, suggested by Lemma 12. According to Theorem 13, we set τ 2pn{ log d1d2q1{6 and λ 10 a log d1d2{n. The sparse precision matrix estimator is defined in (A.1) and (A.2) with truncation threshold 2pn{ log d2q1{4 and γ 10 a log d2{n. We compute the sparse matrix estimator via p16q with τ n{ log d1d2 6 and λ a log d1d2{n. The sparse precision matrix is estimated by conducting CLIME procedure with γ 5 a log d2{n. |