Deep Generative Models for Relational Data with Side Information

Authors: Changwei Hu, Piyush Rai, Lawrence Carin

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.
Researcher Affiliation Collaboration 1Yahoo! Research, New York, NY, USA 2CSE Department, IIT Kanpur, Kanpur, UP, India 3Duke University, Durham, NC, USA.
Pseudocode No The paper describes the steps of the Gibbs sampler in prose but does not provide structured pseudocode or an algorithm block.
Open Source Code No The paper does not include an unambiguous statement about releasing code for the described work, nor does it provide a direct link to a source-code repository.
Open Datasets Yes We consider seven real-world data sets... Protein230... NIPS234... Conflicts... Facebook... Metabolic... NIPS 1-17... Cite Seer... (Ghosn et al., 2004)
Dataset Splits No For the two data sets without side information (Protein230 and NIPS234), we hold out 20% data as our test data. For the remaining five data sets, we hold out 80% data as our test data as we were interested in highly missing data regimes to investigate how much the side information is benefitting in such difficult cases. (No explicit mention of a validation set or split for model tuning.)
Hardware Specification Yes All the models are implemented in MATLAB and were run on a standard machine with 2.40GHz processor and 16GB RAM.
Software Dependencies No The paper mentions that models are 'implemented in MATLAB' but does not provide specific version numbers for MATLAB or any other ancillary software components.
Experiment Setup Yes We set K to a large enough number (K = 100) so that all models are evaluated with sufficient number of latent features. Our models and the other baselines (except HGP-EPM) are run with 1000 burn-in iterations, and another 1000 iterations for sample collection. For the HGP-EPM baseline, we use the default setting from (Zhou, 2015) and run their model for 3000 burn-in and 1000 collection iterations. The samplers are initialized randomly.