Scalable Bayesian Non-linear Matrix Completion

Authors: Xiangju Qin, Paul Blomstedt, Samuel Kaski

IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation of the method, using simulations and a benchmark dataset, is given in Section 4. The paper ends with conclusions in Section 5. In this section, we evaluate the predictive performance of the proposed method for out-of-matrix prediction problems on simulated and real-world chemogenomic data, and compare it with two alternative approaches
Researcher Affiliation Academia Xiangju Qin , Paul Blomstedt and Samuel Kaski Department of Computer Science, Helsinki Institute for Information Technology HIIT, Aalto University, 00076 Espoo, Finland xiangju.qin@helsinki.fi, paul.blomstedt@aalto.fi, samuel.kaski@aalto.fi
Pseudocode No The paper describes computational strategies and methods but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code No Our implementation is based on the GPy1 package. (Footnote: "1https://sheffieldml.github.io/GPy/") - This refers to a third-party package, not the authors' own implementation.
Open Datasets Yes We performed the experiments on Ex CAPE-DB data [Sun et al., 2017], which is an aggregation of public compound-target bioactivity data and describes interactions between drugs and targets using the p IC502 measure.
Dataset Splits Yes As the task is to perform out-of-matrix prediction, we randomly selected 20% of the rows as a test set, using the remaining rows as the training set. We used 3-fold cross validation to split the training and test set, where about 30% of the rows or compounds were chosen as test set in each fold.
Hardware Specification No Macau4 was run on compute nodes with 20 CPUs; all the other methods were run on a single CPU. (This specifies the number of CPUs but not the specific CPU model or other hardware details.)
Software Dependencies No Our implementation is based on the GPy1 package. and The dataset has 469 chem2vec features as side information which are generated from ECFP fingerprint features for the compounds using word2vec software. and We ran the Macau version available in SMURFF software: https://github.com/Exa Science/smurff. None of these mention specific version numbers.
Experiment Setup Yes The experimental setting for MRD models is: number of inducing points 100, optimization through scaled conjugate gradients (SCG) with 500 iterations. For the So D approach, the latent variables were initialized with PPCA method. We ran Macau with Gibbs sampling for 1200 iterations, discarded the first 800 samples as burn-in and saved every second of the remaining samples yielding in total 200 posterior samples. We set the dimension of latent variables K=10 for Ex CAPE-DB data, K=5 for simulated data for all methods.