Revisiting Semi-Supervised Learning with Graph Embeddings

Authors: Zhilin Yang, William Cohen, Ruslan Salakhudinov

ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models.
Researcher Affiliation Academia Zhilin Yang ZHILINY@CS.CMU.EDU William W. Cohen WCOHEN@CS.CMU.EDU Ruslan Salakhutdinov RSALAKHU@CS.CMU.EDU School of Computer Science, Carnegie Mellon University
Pseudocode Yes Algorithm 1 Sampling Context Distribution p(i, c, γ) Algorithm 2 Model Training (Transductive)
Open Source Code No The paper references an implementation of a baseline method (Deepwalk) via a GitHub link (Footnote 3: "https://github.com/phanein/deepwalk"), but it does not provide an explicit statement or link to the authors' own implementation of Planetoid or other code for their described methodology.
Open Datasets Yes We first considered three text classification datasets5, Citeseer, Cora and Pubmed (Sen et al., 2008). (Footnote 5: http://linqs.umiacs.umd.edu/projects//projects/lbc/) We next considered the DIEL (Distant Information Extraction using coordinate-term Lists) dataset (Bing et al., 2015). We sorted out an entity classification dataset from the knowledge base of Never Ending Language Learning (NELL) (Carlson et al., 2010) and a hierarchical entity classification dataset (Dalvi & Cohen, 2016) that links NELL entities to text in Clue Web09.
Dataset Splits No The paper states: "We tune r2, T1, T2, the learning rate and hyper-parameters in other models based on an additional data split with a different random seed." This implies the use of a validation set for tuning, but it does not provide specific details (e.g., size, percentage, how it was created) of this validation split for reproducibility.
Hardware Specification No The paper does not provide any specific details about the hardware used for experiments, such as CPU or GPU models, memory, or cloud computing instances.
Software Dependencies No The paper mentions using "the Junto library (Talukdar & Crammer, 2009) for label propagation, and SVMLight4 for TSVM" (Footnote 4: http://svmlight.joachims.org/). However, it does not specify version numbers for these software components, which is necessary for reproducibility.
Experiment Setup Yes In all of our experiments, we set the model hyper-parameters to r1 = 5/6, q = 10, d = 3, N1 = 200 and N2 = 200 for Planetoid. We use the same r1, q and d for Graph Emb, and the same N1 and N2 for Mani Reg and Semi Emb.