Adaptive Sampling Towards Fast Graph Representation Learning

Authors: Wenbing Huang, Tong Zhang, Yu Rong, Junzhou Huang

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Intensive experiments on several benchmarks verify the effectiveness of our method regarding the classification accuracy while enjoying faster convergence speed. We evaluate the performance of our methods on the following benchmarks: (1) categorizing academic papers in the citation network datasets Cora, Citeseer and Pubmed [11]; (2) predicting which community different posts belong to in Reddit [3]. These graphs are varying in sizes from small to large.
Researcher Affiliation Collaboration Wenbing Huang1, Tong Zhang2, Yu Rong1, Junzhou Huang1 1 Tencent AI Lab. ; 2 Australian National University;
Pseudocode No No structured pseudocode or algorithm blocks were found in the paper.
Open Source Code No The paper mentions using and re-implementing other methods' codes and performing comparisons by using the 'public codes' of Fast GCN, but does not provide an explicit statement or link for their own open-source code for the 'Adapt' method.
Open Datasets Yes We evaluate the performance of our methods on the following benchmarks: (1) categorizing academic papers in the citation network datasets Cora, Citeseer and Pubmed [11]; (2) predicting which community different posts belong to in Reddit [3].
Dataset Splits No We train all models using early stopping with a window size of 30, as suggested by [9], and report the results corresponding to the best validation accuracies. However, the paper does not specify exact split percentages or sample counts for the training, validation, and test sets.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory amounts used for running experiments. It generally discusses computations but no hardware specifics.
Software Dependencies No The paper mentions 'TensorFlow [24]' but does not provide a specific version number or any other software dependencies with version numbers required for replication.
Experiment Setup Yes The hidden dimensions for the citation network datasets (i.e., Cora, Citeseer and Pubmed) are set to be 16. For the Reddit dataset, the hidden dimensions are selected to be 256 as suggested by [3]. The numbers of the sampling nodes for all layers excluding the top one are set to 128 for Cora and Citeseer, 256 for Pubmed and 512 for Reddit. The sizes of the top layer (i.e. the stochastic mini-batch size) are chosen to be 256 for all datasets. We train all models using early stopping with a window size of 30, as suggested by [9], and report the results corresponding to the best validation accuracies. ... λ is the trade-off parameter and fixed as 0.5 in our experiments.