Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Scalable high-dimensional Bayesian varying coefficient models with unknown within-subject covariance
Authors: Ray Bai, Mary R. Boland, Yong Chen
JMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the scalability, variable selection performance, and inferential capabilities of our method through simulations and a real data application. |
| Researcher Affiliation | Academia | Ray Bai EMAIL Department of Statistics University of South Carolina Columbia, SC 29201, USA Mary R. Boland EMAIL Department of Mathematics Saint Vincent College Latrobe, PA 15650, USA Yong Chen EMAIL Department of Biostatistics, Epidemiology, and Informatics University of Pennsylvania Philadelphia, PA 19104, USA |
| Pseudocode | Yes | Algorithm 1 ECM algorithm for MAP estimation under NVC-SSL Algorithm 2 MCMC algorithm for NVC-SSL Algorithm 3 Exact algorithm for sampling γ in Step 6 of Algorithm 2 when dp > n Algorithm 4 Approximate algorithm for sampling γ in Step 6 of Algorithm 2 when dp > n |
| Open Source Code | Yes | These algorithms are implemented in the publicly available R package NVCSSL on the Comprehensive R Archive Network. All of the methods in this section were implemented in the publicly available R package NVCSSL, which can be found on the Comprehensive R Archive Network. |
| Open Datasets | Yes | The data that we used comes from the α-factor synchronized cultures of Spellman et al. (1998) and the CHIP-chip data of Lee et al. (2002). |
| Dataset Splits | Yes | To assess their variable selection performance, we fit these models using all n = 47 genes. We also examined the models predictive power. To do so, we randomly divided the dataset into 37 training observations and 10 test observations. We fit the NVC models to the training data and then used our fitted models to predict the trajectories of m RNA level by(t) for the 10 test observations and compute the out-of-sample MSPE. We repeated this procedure 200 times, so that we had 200 different test sets on which to evaluate these different methods. |
| Hardware Specification | Yes | All of our experiments were performed on an Intel Xeon 8358 Platinum processor with 2.6GHz CPU and 128 GB memory. Running the exact algorithm for 2000 iterations also took 2.3 hours for the one replicate in Figure 3, whereas the approximate algorithm only took only 6.2 minutes on an 11th Gen Intel Core i5-1135G7 processor. |
| Software Dependencies | No | The paper mentions "publicly available R package NVCSSL" and "R package sns" but does not specify version numbers for R or any other dependencies beyond the package itself. It states that the methods were implemented in the package, but not specific software versions of R or other libraries. For example, it does not state "R version X.X" or "PyTorch 1.9". |
| Experiment Setup | Yes | We compared the MSE for the posterior mean functions eβk(t) s obtained from the exact MCMC and the approximate MCMC algorithms. We also compared the average width and the empirical coverage probability (ECP) of the 95% posterior credible intervals. We looked at both the pointwise ECP (i.e. the proportion of pointwise credible intervals that contained the true value of βk(tij) for each observed time point tij 1 i n, 1 j mi) and the simultaneous ECP. Here, the simultaneous ECP was determined by the proportion of simulations where all of the posterior credible intervals covered all of the true varying coefficient functions in the entire time domain. In all replications, we ran both the exact and approximate MCMC algorithms introduced in Section 4 for 2000 iterations, discarding the first 500 iterations as burnin. The remaining 1500 MCMC samples were used to approximate the posteriors and perform uncertainty quantification. Our MCMC algorithms were initialized with the MAP estimator obtained from the ECM algorithm, and all hyperparameters and basis dimensions were the same as those used for the ECM algorithm. |