reproducibilityindex.ai

Deep Generative Models for Relational Data with Side Information

Authors: Changwei Hu, Piyush Rai, Lawrence Carin

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We compare with various state-of-the-art methods and report results, both quantitative and qualitative, on several benchmark data sets.
Researcher Affiliation	Collaboration	1Yahoo! Research, New York, NY, USA 2CSE Department, IIT Kanpur, Kanpur, UP, India 3Duke University, Durham, NC, USA.
Pseudocode	No	The paper describes the steps of the Gibbs sampler in prose but does not provide structured pseudocode or an algorithm block.
Open Source Code	No	The paper does not include an unambiguous statement about releasing code for the described work, nor does it provide a direct link to a source-code repository.
Open Datasets	Yes	We consider seven real-world data sets... Protein230... NIPS234... Conﬂicts... Facebook... Metabolic... NIPS 1-17... Cite Seer... (Ghosn et al., 2004)
Dataset Splits	No	For the two data sets without side information (Protein230 and NIPS234), we hold out 20% data as our test data. For the remaining five data sets, we hold out 80% data as our test data as we were interested in highly missing data regimes to investigate how much the side information is beneﬁtting in such difﬁcult cases. (No explicit mention of a validation set or split for model tuning.)
Hardware Specification	Yes	All the models are implemented in MATLAB and were run on a standard machine with 2.40GHz processor and 16GB RAM.
Software Dependencies	No	The paper mentions that models are 'implemented in MATLAB' but does not provide specific version numbers for MATLAB or any other ancillary software components.
Experiment Setup	Yes	We set K to a large enough number (K = 100) so that all models are evaluated with sufﬁcient number of latent features. Our models and the other baselines (except HGP-EPM) are run with 1000 burn-in iterations, and another 1000 iterations for sample collection. For the HGP-EPM baseline, we use the default setting from (Zhou, 2015) and run their model for 3000 burn-in and 1000 collection iterations. The samplers are initialized randomly.