reproducibilityindex.ai

Boost then Convolve: Gradient Boosting Meets Graph Neural Networks

Authors: Sergei Ivanov, Liudmila Prokhorenkova

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	With an extensive experimental comparison to the leading GBDT and GNN models, we demonstrate a signiﬁcant increase in performance on a variety of graphs with tabular features. The code is available: https://github.com/nd7141/bgnn. We have performed a comparative evaluation of BGNN and Res-GNN against a wide variety of strong baselines and previous approaches on heterogeneous node prediction problems, achieving signiﬁcant improvement in performance across all of them. This section outlines our experimental setting, the results on node regression and classiﬁcation problems, and extracted feature representations. The results of our comparative evaluation for node regression are summarized in Table 2.
Researcher Affiliation	Collaboration	Sergei Ivanov Criteo AI Lab; Skoltech Paris, France s.ivanov@criteo.com Liudmila Prokhorenkova Yandex; HSE University; MIPT Moscow, Russia ostroumova-la@yandex-team.ru
Pseudocode	Yes	Algorithm 1 Training of BGNN
Open Source Code	Yes	The code is available: https://github.com/nd7141/bgnn.
Open Datasets	Yes	We utilize ﬁve real-world node regression datasets with different properties outlined in Table 1. Four of these datasets are heterogeneous, i.e., the input features are of different types, scales, and meaning. For example, for the VK dataset, the node features are both numerical (e.g., last time seen on the platform) and categorical (e.g., country of living and university). On the other hand, Wiki dataset is homogeneous, i.e., the node features are interdependent and correspond to the bag-of-words representations of Wikipedia articles. Additional details about the datasets can be found in Appendix C. For node classiﬁcation, we use ﬁve datasets with different properties. Due to the lack of publicly available datasets with heterogeneous node features, we adopt the datasets House class and VK class from the regression task by converting the target labels into several discrete classes. We additionally include two sparse node classiﬁcation datasets SLAP and DBLP coming from heterogeneous information networks (HIN) with nodes having different types. We also include one homogeneous dataset OGB-Ar Xiv (Hu et al., 2020a).
Dataset Splits	Yes	We ensure that the comparison is done fairly by training each model until the convergence with a reasonable set of hyperparameters evaluated on the validation set. We run each hyperparameter setting three times and take the average of the results. Furthermore, we have ﬁve random splits of the data, and the ﬁnal number represents the average performance of the model for all ﬁve random seeds. More details about hyperparameters can be found in Appendix B. We use ﬁve random splits for train/validation/test with 0.6/0.2/0.2 ratio.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments. It discusses training time but not the underlying hardware.
Software Dependencies	No	The paper mentions software like "Cat Boost", "Light GBM", and "Pytorch geometric" (used for GNNs) but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We ensure that the comparison is done fairly by training each model until the convergence with a reasonable set of hyperparameters evaluated on the validation set. More details about hyperparameters can be found in Appendix B. Light GBM: number of leaves is {15, 63}, \|\|λ\|\|2 = 0, boosting type is gbdt, number of epochs is 1000, early stopping rounds is 100. Cat Boost: depth is {4, 6}, \|\|λ\|\|2 = 0, number of epochs is 1000, early stopping rounds is 100. FCNN: number of layers is {2, 3}, dropout is {0., 0.5}, hidden dimension is 64, number of epochs is 5000, early stopping rounds is 2000. GNN: dropout rate is {0., 0.5}, hidden dimension is 64, number of epochs is 2000, early stopping rounds is 200. GAT, GCN, and AGNN models have two convolutional layers with dropout and ELU activation function (Clevert et al., 2016). APPNP has a two-layer fully-connected neural network with dropout and ELU activation followed by a convolutional layer with k = 10 and α = 0.1. We use eight heads with eight hidden neurons for GAT model. Res-GNN: dropout rate is {0., 0.5}, hidden dimension is 64, number of epochs is 1000, early stopping rounds is 100. We also tune whether to use solely predictions of Cat Boost model or append them to the input features. Cat Boost model is trained for 1000 epochs. BGNN: dropout rate is {0., 0.5}, hidden dimension is 64, number of epochs is 200, early stopping rounds is 10, number of trees and backward passes per epoch is {10, 20}, depth of the tree is 6. We also tune whether to use solely predictions of Cat Boost model or append them to the input features. For all models, we also perform a hyperparameter search on learning rate in {0.1, 0.01}.