GraphFormers: GNN-nested Transformers for Representation Learning on Textual Graph

Authors: Junhan Yang, Zheng Liu, Shitao Xiao, Chaozhuo Li, Defu Lian, Sanjay Agrawal, Amit Singh, Guangzhong Sun, Xing Xie

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive evaluations are conducted on three large-scale benchmark datasets, where Graph Formers outperform the SOTA baselines with comparable running efficiency.
Researcher Affiliation Collaboration o University of Science and Technology of China, Hefei, China p Microsoft Research Asia, Beijing, China m Beijing University of Posts and Telecommunications, Beijing, China n Microsoft India Development Center, Bengaluru, India
Pseudocode Yes Algorithm 1: Graph Formers Workflow
Open Source Code Yes The source code is released at https://github.com/microsoft/GraphFormers.
Open Datasets Yes DBLP5, which contains the paper citation graph from DBLP up to 2020-04-09. The paper s title is used as the textual feature. Wikidata5M6 (Wiki) (Wang et al., 2019b), which contains the entity graph from Wikipedia. The first sentence in each entity s introduction is taken as its textual feature.
Dataset Splits Yes Table 1: Specifications of the experimental datasets: the number of items, the number of neighbour nodes on average, and the number of training, validation, testing cases. ... Product ... #Train 22,146,934 #Valid 30,000 #Test 306,742
Hardware Specification Yes The evaluation is made with a Nvidia P100 GPU.
Software Dependencies No The paper mentions software components like 'Uni LM-base', 'BERT-like PLM', and 'Word Piece' but does not specify their version numbers or the versions of general frameworks like PyTorch or TensorFlow.
Experiment Setup Yes In our experiment, each text is associated with 5 uniformly sampled neighbours (without replacement); for texts with neighbourhood smaller than 5, all the neighbours will be utilized. ... We use the common MLM strategy, where 15% of the input tokens are masked: 80% of them are replaced by [MASK], the rest ones are replaced randomly or kept as the original tokens with the same probabilities. ... Each mini-batch contains 32 encoding instances; each instance contains one center and #N neighbour nodes; the token length of each node is 16.