Revisiting Semi-Supervised Learning with Graph Embeddings
Authors: Zhilin Yang, William Cohen, Ruslan Salakhudinov
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | On a large and diverse set of benchmark tasks, including text classification, distantly supervised entity extraction, and entity classification, we show improved performance over many of the existing models. |
| Researcher Affiliation | Academia | Zhilin Yang ZHILINY@CS.CMU.EDU William W. Cohen WCOHEN@CS.CMU.EDU Ruslan Salakhutdinov RSALAKHU@CS.CMU.EDU School of Computer Science, Carnegie Mellon University |
| Pseudocode | Yes | Algorithm 1 Sampling Context Distribution p(i, c, γ) Algorithm 2 Model Training (Transductive) |
| Open Source Code | No | The paper references an implementation of a baseline method (Deepwalk) via a GitHub link (Footnote 3: "https://github.com/phanein/deepwalk"), but it does not provide an explicit statement or link to the authors' own implementation of Planetoid or other code for their described methodology. |
| Open Datasets | Yes | We first considered three text classification datasets5, Citeseer, Cora and Pubmed (Sen et al., 2008). (Footnote 5: http://linqs.umiacs.umd.edu/projects//projects/lbc/) We next considered the DIEL (Distant Information Extraction using coordinate-term Lists) dataset (Bing et al., 2015). We sorted out an entity classification dataset from the knowledge base of Never Ending Language Learning (NELL) (Carlson et al., 2010) and a hierarchical entity classification dataset (Dalvi & Cohen, 2016) that links NELL entities to text in Clue Web09. |
| Dataset Splits | No | The paper states: "We tune r2, T1, T2, the learning rate and hyper-parameters in other models based on an additional data split with a different random seed." This implies the use of a validation set for tuning, but it does not provide specific details (e.g., size, percentage, how it was created) of this validation split for reproducibility. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for experiments, such as CPU or GPU models, memory, or cloud computing instances. |
| Software Dependencies | No | The paper mentions using "the Junto library (Talukdar & Crammer, 2009) for label propagation, and SVMLight4 for TSVM" (Footnote 4: http://svmlight.joachims.org/). However, it does not specify version numbers for these software components, which is necessary for reproducibility. |
| Experiment Setup | Yes | In all of our experiments, we set the model hyper-parameters to r1 = 5/6, q = 10, d = 3, N1 = 200 and N2 = 200 for Planetoid. We use the same r1, q and d for Graph Emb, and the same N1 and N2 for Mani Reg and Semi Emb. |