Sharing Pattern Submodels for Prediction with Missing Values
Authors: Lena Stempfle, Ashkan Panahi, Fredrik D. Johansson
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate the proposed SPSM model on simulated and on real-world data, aiming to answer two main questions: How does the accuracy of SPSM compare to baseline models, including impute-then-regress, for small and larger samples; How does sparsity in pattern specializations affect performance and interpretation? In Figure 2, we show the test set coefficient of determination (R2) for Setting A. We report the results on health care data in Table 2. |
| Researcher Affiliation | Academia | Chalmers University of Technology Department of Computer Science and Engineering, Gothenburg, Sweden stempfle@chalmers.se, ashkan.panahi@chalmers.se, fredrik.johansson@chalmers.se |
| Pseudocode | No | The paper describes the proposed method in prose and mathematical formulations but does not include explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code to reproduce experiments and the appendix are available at https://github.com/Healthy-AI/spsm. |
| Open Datasets | Yes | The data is obtained from the publicly available Alzheimer s Disease Neuroimaging Initiative (ADNI) database. (http://adni.loni.usc.edu). We use data from the Study to Understand Prognoses and Preferences for Outcomes and Risks of Treatments (SUPPORT) (Knaus et al. 1995). More information on the non-health related HOUSING (De Cock 2011) data is shown in Appendix C.3. |
| Dataset Splits | Yes | Hyperparameters are based on the validation set. The parameter selection is based on the validation set and aligns with the test set results. We generate samples with d = 20 and k = 5. In Figure 2, we show the test set coefficient of determination (R2) for Setting A. Error bars show standard deviation over 5 random data splits. |
| Hardware Specification | No | The computations were enabled by resources provided by the Swedish National Infrastructure for Computing (SNIC) at Chalmers Centre for Computational Science and Engineering (C3SE) partially funded by the Swedish Research Council through grant agreement no. 2018-05973. This describes the computing environment but does not specify exact hardware components like GPU or CPU models. |
| Software Dependencies | No | Both linear and logistic variants of SPSM were trained using the L-BFGS-B solver provided as part of the Sci Py Python package (Virtanen et al. 2020). For imputation, we use zero (I0), mean (Iµ) or iterative imputation (Iit) from Sci Kit-Learn (Pedregosa et al. 2011; Van Buuren 2018). XGBoost (XGB), where missing values are supported by default (Chen et al. 2019). While software components are mentioned, specific version numbers for these dependencies are not provided in the text. |
| Experiment Setup | Yes | In the experiments, γ can take values within [0, 0.1, 1, 5, 10, 100], and we used a shared λm = λ [1, 5, 10, 100, 1000, 1e8] for all patterns. Intercepts were added for both the main model and for each pattern without regularization. |