reproducibilityindex.ai

Longitudinal Deep Kernel Gaussian Process Regression

Authors: Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar8556-8564

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Results of extensive experiments on several benchmark data sets demonstrate that L-DKGPR signiﬁcantly outperforms the state-of-the-art longitudinal data analysis (LDA) methods. We compare L-DKGPR to several state-of-the-art LDA and GP methods on simulated as well as real-world benchmark data. The experiments are designed to answer research questions about accuracy, scalability, and interpretability of LDKGPR.
Researcher Affiliation	Academia	Junjie Liang, Yanting Wu, Dongkuan Xu, Vasant G Honavar Pennsylvania State University {jul672, yxw514, dux19, vhonavar}@psu.edu
Pseudocode	Yes	Algorithm 1: L-DKGPR
Open Source Code	Yes	Data and codes used in this paper are publicly available at https://github.com/junjieliang672/L-DKGPR.
Open Datasets	Yes	We used one simulated data set and three real-world longitudinal data sets in our experiments:2 Simulated Data. We construct simulated longitudinal data that exhibit i.e., longitudinal correlation (LC) and multilevel correlation (MC) as follows: The outcome is generated using y = f(X) + ϵ where f(X) is a non-linear transformation based on the observed covariate matrix X and the residual ϵ N(0, Σ). To simulate longitudinal correlation, we simply set Σ to a block diagonal matrix with non-zero entries for within-individual observations. To simulate multilevel correlation, we ﬁrst split the individuals into C clusters and assign non-zero entries for the data points in the same cluster. Following (Cheng et al. 2019; Timonen et al. 2019), we simulate 40 individuals, 20 observations, and 30 covariates for each individual. We vary the number of clusters C from [2, 5]. Study of Women s Health Across the Nation (SWAN). (Sutton-Tyrrell et al. 2005). General Social Survey (GSS). (Smith et al. 2015). The Alzheimer s Disease Prediction (TADPOLE). (Marinescu et al. 2018).
Dataset Splits	Yes	We use 50%, 20%, 30% of data for training, validation, and testing respectively.
Hardware Specification	No	Because not all baseline methods take advantage of GPU acceleration, we compare the run times of all the methods without GPU acceleration. For CPU run time analysis, please refer to our Appendix. This text mentions 'GPU acceleration' and 'CPU run time' but does not specify any exact hardware models or specifications (e.g., NVIDIA A100, Intel Xeon).
Software Dependencies	No	Implementation details3 and hyper-parameter settings of L-DKGPR as well as the baseline approaches are provided in the Appendix. The paper also links to a GitHub repository for code and data, but it does not explicitly list software dependencies with version numbers within the main text.
Experiment Setup	Yes	To evaluate the regression performance, similar to (Liang et al. 2020), we compute the mean and standard deviation of R2 between the actual and predicted outcomes of each method on each data set across 10 independent runs. We use 50%, 20%, 30% of data for training, validation, and testing respectively. Implementation details3 and hyper-parameter settings of L-DKGPR as well as the baseline approaches are provided in the Appendix.