Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
MaxCutBench: Revisiting and Benchmarking Graph Neural Networks for Maximum Cut
Authors: Ankur Nath, Alan Kuhnle
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using our benchmark, we conduct an in-depth analysis of the implemented algorithms on a carefully selected set of hard instances from diverse graph datasets. |
| Researcher Affiliation | Academia | Ankur Nath EMAIL Department of Computer Science & Engineering Texas A&M University Alan Kuhnle EMAIL Department of Computer Science & Engineering Texas A&M University |
| Pseudocode | Yes | Algorithm 1 Tabu Search Algorithm 2 Extremal Optimization Algorithm |
| Open Source Code | Yes | Code, data, and pre-trained models are available at: https://github.com/ankurnath/Max Cut-Bench. |
| Open Datasets | Yes | Gset. This dataset (Ye, 2003) is extensively used to benchmark classical heuristics (Benlic & Hao, 2013; Leleu et al., 2019; 2021) for Max Cut. The dataset comprises three types of weighted and unweighted random graphs: ER graphs with uniform edge probabilities, Planar graphs with decaying connectivity, and regular Toroidal graphs. For generating training and validation distributions, we use the independent graph generator Rudy by Giovanni Rinaldi, which is used for generating Gset graphs as sourced from Ye (2003). |
| Dataset Splits | Yes | All training is performed on randomly 4000 generated graphs and the validation is performed on a fixed set of 50 held-out graphs from the same distribution. For synthetic datasets, testing is performed on 100 instances drawn from the same distribution; or upon the test instances provided in the original resource (see details in Appendix A.1). |
| Hardware Specification | Yes | All experiments were conducted on a Linux server with a GPU (NVIDIA RTX A6000) and CPU (AMD EPYC 7713), using Py Torch 2.3.0, DGL 2.2.1 (Wang, 2019) and Python 3.11.9. |
| Software Dependencies | Yes | All experiments were conducted on a Linux server with a GPU (NVIDIA RTX A6000) and CPU (AMD EPYC 7713), using Py Torch 2.3.0, DGL 2.2.1 (Wang, 2019) and Python 3.11.9. |
| Experiment Setup | Yes | For all other algorithms, we run each algorithm for 50 randomly initialized episodes with 2|V | number of search steps per episode and select the best outcome from these runs following the experimental setup of Barrett et al. (2020) and Yao et al. (2021). Experimentally, we have found that the performance of all algorithms saturates within this number of search steps (we refer the reader to Appendix A.6 for more details). |