Rethinking the Capacity of Graph Neural Networks for Branching Strategy

Authors: Ziang Chen, Jialin Liu, Xiaohan Chen, Wang, Wotao Yin

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental A small-scale numerical experiment is conducted to directly validate our theoretical findings. We implement numerical experiments to validate our theoretical findings in Section 4.
Researcher Affiliation Collaboration Ziang Chen Massachusetts Institute of Technology ziang@mit.edu; Jialin Liu University of Central Florida jialin.liu@ucf.edu; Xiaohan Chen Xinshang Wang Wotao Yin Alibaba US, DAMO Academy {xiaohan.chen,xinshang.w,wotao.yin}@alibaba-inc.com
Pseudocode Yes Algorithm 1 The WL test for MILP-Graphs; Algorithm 2 2-FWL test for MILP-Graphs
Open Source Code Yes We provide the source code and data to reproduce the main results, along with sufficient instructions to run the code, in the supplementary materials.
Open Datasets No We randomly generate 100 MILP instances, with 6 constraints and 20 variables... We train the MP-GNN and 2-FGNN to fit the SB scores of (4.1) and (4.2)...
Dataset Splits No The paper mentions training data and training loss ('average ℓ2 norm of errors across all training instances') but does not explicitly state a validation set or split for hyperparameter tuning.
Hardware Specification Yes The numerical experiments are conducted on a single NVIDIA Tesla V100 GPU
Software Dependencies Yes We implement MP-GNN and 2-FGNN with Python 3.6 and Tensor Flow 1.15.0 [1].
Experiment Setup Yes We train an MP-GNN and a 2-FGNN with L = 2... p0, q0 are parameterized as linear transformations followed by a non-linear activation function; {pl, ql, f l, gl}L l=1 are parameterized as 3-layer multi-layer perceptrons (MLPs) with respective learnable parameters; and the output mapping r is parameterized as a 2-layer MLP. All layers map their input to a 1024-dimensional vector and use the Re LU activation function. ... minimizing 1 2 P G G Fθ(G) SB(G) 2 with respect to θ, using Adam [31]. ... with a learning rate of 10 5 for all networks. We decay the learning rate to 10 6 and 10 7 when the training error reaches 10 6 and 10 12 respectively.