reproducibilityindex.ai

Discriminative Embeddings of Latent Variable Models for Structured Data

Authors: Hanjun Dai, Bo Dai, Le Song

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In applications involving millions of data points, we showed that structure2vec runs 2 times faster, produces models which are 10, 000 times smaller, while at the same time achieving the state-of-the-art predictive performance.
Researcher Affiliation	Academia	Hanjun Dai, Bo Dai {HANJUNDAI, BODAI}@GATECH.EDU Le Song LSONG@CC.GATECH.EDU College of Computing, Georgia Institute of Technology, Atlanta, USA
Pseudocode	Yes	Algorithm 1 Embedding Mean Field; Algorithm 2 Embedding Loopy BP; Algorithm 3 Discriminative Embedding
Open Source Code	Yes	Our algorithms are implemented with C++ and CUDA, and experiments are carried out on clusters equipped with NVIDIA Tesla K20. The code is available on https://github.com/Hanjun-Dai/graphnn.
Open Datasets	Yes	We compare our algorithm on string benchmark datasets with the kernel method with existing sequence kernels, i.e., the spectrum string kernel (Leslie et al., 2002a), mismatch string kernel (Leslie et al., 2002b) and ﬁsher kernel with HMM generative models (Jaakkola & Haussler, 1999). On graph benchmark datasets, we compare with subtree kernel (Ramon & G artner, 2003) (R&G, for short), random walk kernel(G artner et al., 2003; Vishwanathan et al., 2010), shortest path kernel (Borgwardt & Kriegel, 2005), graphlet kernel(Shervashidze et al., 2009) and the family of Weisfeiler-Lehman kernels (WL kernel) (Shervashidze et al., 2011). The ﬁrst one (denoted as SCOP) contains 7329 sequences obtained from SCOP (Structural Classiﬁcation of Proteins) 1.59 database (Andreeva et al., 2004). ... The Harvard Clean Energy Project (Hachmann et al., 2011) is a theory-driven search for the next generation of organic solar cell materials.
Dataset Splits	Yes	Without explicitly mentioned, we perform cross validation for all methods, and report the average performance. We include the details of tuning hyper parameters for baselines and our methods in Appendix E.2.
Hardware Specification	Yes	Our algorithms are implemented with C++ and CUDA, and experiments are carried out on clusters equipped with NVIDIA Tesla K20.
Software Dependencies	No	Our algorithms are implemented with C++ and CUDA, and experiments are carried out on clusters equipped with NVIDIA Tesla K20. ... We use RDKit (Landrum, 2012) to extract features for atoms (nodes) and bonds (edges).
Experiment Setup	No	The paper mentions using stochastic gradient descent and discusses loss functions, but it does not provide specific hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed optimizer settings in the main text. It defers tuning details to Appendix E.2.