reproducibilityindex.ai

Revisiting Semi-Supervised Learning with Graph Embeddings

Authors: Zhilin Yang, William Cohen, Ruslan Salakhudinov

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On a large and diverse set of benchmark tasks, including text classiﬁcation, distantly supervised entity extraction, and entity classiﬁcation, we show improved performance over many of the existing models.
Researcher Affiliation	Academia	Zhilin Yang ZHILINY@CS.CMU.EDU William W. Cohen WCOHEN@CS.CMU.EDU Ruslan Salakhutdinov RSALAKHU@CS.CMU.EDU School of Computer Science, Carnegie Mellon University
Pseudocode	Yes	Algorithm 1 Sampling Context Distribution p(i, c, γ) Algorithm 2 Model Training (Transductive)
Open Source Code	No	The paper references an implementation of a baseline method (Deepwalk) via a GitHub link (Footnote 3: "https://github.com/phanein/deepwalk"), but it does not provide an explicit statement or link to the authors' own implementation of Planetoid or other code for their described methodology.
Open Datasets	Yes	We first considered three text classiﬁcation datasets5, Citeseer, Cora and Pubmed (Sen et al., 2008). (Footnote 5: http://linqs.umiacs.umd.edu/projects//projects/lbc/) We next considered the DIEL (Distant Information Extraction using coordinate-term Lists) dataset (Bing et al., 2015). We sorted out an entity classiﬁcation dataset from the knowledge base of Never Ending Language Learning (NELL) (Carlson et al., 2010) and a hierarchical entity classiﬁcation dataset (Dalvi & Cohen, 2016) that links NELL entities to text in Clue Web09.
Dataset Splits	No	The paper states: "We tune r2, T1, T2, the learning rate and hyper-parameters in other models based on an additional data split with a different random seed." This implies the use of a validation set for tuning, but it does not provide specific details (e.g., size, percentage, how it was created) of this validation split for reproducibility.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for experiments, such as CPU or GPU models, memory, or cloud computing instances.
Software Dependencies	No	The paper mentions using "the Junto library (Talukdar & Crammer, 2009) for label propagation, and SVMLight4 for TSVM" (Footnote 4: http://svmlight.joachims.org/). However, it does not specify version numbers for these software components, which is necessary for reproducibility.
Experiment Setup	Yes	In all of our experiments, we set the model hyper-parameters to r1 = 5/6, q = 10, d = 3, N1 = 200 and N2 = 200 for Planetoid. We use the same r1, q and d for Graph Emb, and the same N1 and N2 for Mani Reg and Semi Emb.