Longitudinal Deep Kernel Gaussian Process Regression
Authors: Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar8556-8564
AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Results of extensive experiments on several benchmark data sets demonstrate that L-DKGPR significantly outperforms the state-of-the-art longitudinal data analysis (LDA) methods. We compare L-DKGPR to several state-of-the-art LDA and GP methods on simulated as well as real-world benchmark data. The experiments are designed to answer research questions about accuracy, scalability, and interpretability of LDKGPR. |
| Researcher Affiliation | Academia | Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar Pennsylvania State University {jul672, yxw514, dux19, vhonavar}@psu.edu |
| Pseudocode | Yes | Algorithm 1: L-DKGPR |
| Open Source Code | Yes | Data and codes used in this paper are publicly available at https://github.com/junjieliang672/L-DKGPR. |
| Open Datasets | Yes | We used one simulated data set and three real-world longitudinal data sets in our experiments:2 Simulated Data. We construct simulated longitudinal data that exhibit i.e., longitudinal correlation (LC) and multilevel correlation (MC) as follows: The outcome is generated using y = f(X) + ϵ where f(X) is a non-linear transformation based on the observed covariate matrix X and the residual ϵ N(0, Σ). To simulate longitudinal correlation, we simply set Σ to a block diagonal matrix with non-zero entries for within-individual observations. To simulate multilevel correlation, we first split the individuals into C clusters and assign non-zero entries for the data points in the same cluster. Following (Cheng et al. 2019; Timonen et al. 2019), we simulate 40 individuals, 20 observations, and 30 covariates for each individual. We vary the number of clusters C from [2, 5]. Study of Women s Health Across the Nation (SWAN). (Sutton-Tyrrell et al. 2005). General Social Survey (GSS). (Smith et al. 2015). The Alzheimer s Disease Prediction (TADPOLE). (Marinescu et al. 2018). |
| Dataset Splits | Yes | We use 50%, 20%, 30% of data for training, validation, and testing respectively. |
| Hardware Specification | No | Because not all baseline methods take advantage of GPU acceleration, we compare the run times of all the methods without GPU acceleration. For CPU run time analysis, please refer to our Appendix. This text mentions 'GPU acceleration' and 'CPU run time' but does not specify any exact hardware models or specifications (e.g., NVIDIA A100, Intel Xeon). |
| Software Dependencies | No | Implementation details3 and hyper-parameter settings of L-DKGPR as well as the baseline approaches are provided in the Appendix. The paper also links to a GitHub repository for code and data, but it does not explicitly list software dependencies with version numbers within the main text. |
| Experiment Setup | Yes | To evaluate the regression performance, similar to (Liang et al. 2020), we compute the mean and standard deviation of R2 between the actual and predicted outcomes of each method on each data set across 10 independent runs. We use 50%, 20%, 30% of data for training, validation, and testing respectively. Implementation details3 and hyper-parameter settings of L-DKGPR as well as the baseline approaches are provided in the Appendix. |