Non-Negative Inductive Matrix Completion for Discrete Dyadic Data

Authors: Piyush Rai

AAAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our model by performing experiments on a wide variety of data sets, on both quantitative tasks (matrix completion with side information) as well as qualitative analyses (interpretability of the inferred latent factors). We compare our model with three baselines: (1) gamma-Poisson latent factor model (GPLFM) (Zhou et al. 2012) which is similar in construction to our model but cannot leverage side information; (2) Regression-based Latent Factor Model (RLFM) (Agarwal and Chen 2009); and (3) inductive matrix completion (Chiang, Hsieh, and Dhillon 2015), a state-ofthe-art model, which is similar in spirit to our model and can leverage side information. We denote our model by NILFM (for Non-negative Inductive Latent Factor Model). The data sets used in our experiments include: Drug-Target: The drug-target interaction network 1 represents binary-valued interactions between 200 drug molecules and 150 target proteins. As the side information, we have drug and target features representing information from chemical structure similarity and amino acid sequence, respectively.
Researcher Affiliation Academia Piyush Rai Dept. of Computer Science and Engineering IIT Kanpur, India piyush@cse.iitk.ac.in
Pseudocode No The paper describes methods using mathematical equations and prose but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for open-source code for the described methodology.
Open Datasets Yes The data sets used in our experiments include: Drug-Target: The drug-target interaction network 1 represents binary-valued interactions between 200 drug molecules and 150 target proteins. As the side information, we have drug and target features representing information from chemical structure similarity and amino acid sequence, respectively. Lazega-Lawyers: We consider the advising relation in the Lazega lawyers dataset 2 consisting of 71 partners and associates. Movielens 3: We use two versions of this data: Movielens-100K and Movielens-1M. Cora 4: This data set is a citation network consisting of a total of 2708 research papers.
Dataset Splits Yes Each experiment is repeated 5 times with different training/test splits and we report the averaged area under the ROC curve (AUC) for all the data sets. For Cora data, however, we used 50% as the performance of other baselines was unstable when using only 20% training data.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) used for running experiments are provided.
Software Dependencies No The paper mentions "unoptimized MATLAB implementations" but does not specify a version number for MATLAB or any other software dependencies.
Experiment Setup Yes Experimental Settings. For NILFM, we use Gibbs sampling for model inference. For NILFM as well as GPLFM, the Gibbs sampler is run for 2000 iterations with 1000 postburnin collection samples. Our EM based inference algorithm also yields almost similar results as Gibbs sampling based inference (while being much faster); however, since the Gibbs sampler for our model is fast enough, we only used this in our experiments. In all our experiments, K was set to 20 which worked well in practice for all the data sets. Note that the shrinkage prior on λk effectively prunes out the uncessarily components by shrinking λk to close to zero (Zhou et al. 2012). All the model parameters for GPLFM as well as for our model NILFM are initialized randomly.