GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph
Authors: Junhan Yang, Zheng Liu, Shitao Xiao, Chaozhuo Li, Defu Lian, Sanjay Agrawal, Amit Singh, Guangzhong Sun, Xing Xie
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive evaluations are conducted on three large-scale benchmark datasets, where Graph Formers outperform the SOTA baselines with comparable running efficiency. |
| Researcher Affiliation | Collaboration | o University of Science and Technology of China, Hefei, China p Microsoft Research Asia, Beijing, China m Beijing University of Posts and Telecommunications, Beijing, China n Microsoft India Development Center, Bengaluru, India |
| Pseudocode | Yes | Algorithm 1: Graph Formers Workflow |
| Open Source Code | Yes | The source code is released at https://github.com/microsoft/GraphFormers. |
| Open Datasets | Yes | DBLP5, which contains the paper citation graph from DBLP up to 2020-04-09. The paper s title is used as the textual feature. Wikidata5M6 (Wiki) (Wang et al., 2019b), which contains the entity graph from Wikipedia. The first sentence in each entity s introduction is taken as its textual feature. |
| Dataset Splits | Yes | Table 1: Specifications of the experimental datasets: the number of items, the number of neighbour nodes on average, and the number of training, validation, testing cases. ... Product ... #Train 22,146,934 #Valid 30,000 #Test 306,742 |
| Hardware Specification | Yes | The evaluation is made with a Nvidia P100 GPU. |
| Software Dependencies | No | The paper mentions software components like 'Uni LM-base', 'BERT-like PLM', and 'Word Piece' but does not specify their version numbers or the versions of general frameworks like PyTorch or TensorFlow. |
| Experiment Setup | Yes | In our experiment, each text is associated with 5 uniformly sampled neighbours (without replacement); for texts with neighbourhood smaller than 5, all the neighbours will be utilized. ... We use the common MLM strategy, where 15% of the input tokens are masked: 80% of them are replaced by [MASK], the rest ones are replaced randomly or kept as the original tokens with the same probabilities. ... Each mini-batch contains 32 encoding instances; each instance contains one center and #N neighbour nodes; the token length of each node is 16. |