Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Debiased Distributed Learning for Sparse Partial Linear Models in High Dimensions
Authors: Shaogao Lv, Heng Lian
JMLR 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Finally, some simulated experiments are carried out to illustrate the empirical performances of our debiased technique under the distributed setting. |
| Researcher Affiliation | Academia | Shaogao Lv EMAIL Department of Statistics and Data Science Nanjing Audit University Nanjing, China. Heng Lian EMAIL Department of Mathematics City University of Hong Kong Hong Kong, China |
| Pseudocode | No | The paper describes the methodology using mathematical equations and textual explanations, but does not include any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about making code available, nor does it provide links to source code repositories. |
| Open Datasets | No | We generate the data from the model (1), where β = (1, 2, 1, 0.5, 2, 0, . . . , 0) and ϵi N(0, 4). We then generate a vector Zi in Rp from a mean-zero multivariate Gaussian distribution with correlations Cov(Zij, Zij ) = 0.3|j j |, 1 j, j p and then set Ti = Φ(Zi1) and Xij = Zij, j = 2, . . . , p, where Φ is the cumulative distribution function of the standard normal distribution so that Ti (0, 1). |
| Dataset Splits | No | The paper describes how the total data N is randomly allocated to m machines for distributed processing, and specifies various N and m values in the simulations (e.g., N=2000, m=10). However, it does not provide specific training, test, or validation dataset splits for model evaluation. |
| Hardware Specification | No | The simulations are carried out on the computational cluster Katana in the University of New South Wales. |
| Software Dependencies | No | The paper does not specify any software dependencies with version numbers. |
| Experiment Setup | Yes | We select the tuning parameters in the penalties by 5-fold cross-validation in each local machine. We set N = 2000, m = 1, 10 (m = 1 is the centralized estimator) and p = 100, 200, 400, 800, 1600. We generate 200 data sets for each setting. |