Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Underappreciated Power of Vision Models for Graph Structural Understanding
Authors: Xinjian Zhao, Wei Pang, Zhongkai Xue, Xiangru Jian, Lei Zhang, Yaoyao Xu, Xiaozhuang Song, Shu Wu, Tianshu Yu
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our evaluations demonstrate that pure vision encoders perform comparably to specialized GNNs on established graph benchmarks... Our experiments yield several key findings: On tasks requiring the abstraction of global graph properties, vision models demonstrate significant advantages and superior generalization capabilities... Table 1: Performance comparison on different datasets. Results show the accuracy (%) of different models, reported as mean ± std over 5 runs. |
| Researcher Affiliation | Academia | School of Data Science, The Chinese University of Hong Kong, Shenzhen Institute of Automation, Chinese Academy of Sciences Cheriton School of Computer Science, University of Waterloo EMAIL, EMAIL EMAIL EMAIL, EMAIL |
| Pseudocode | Yes | E.4 Algorithmic Implementation The implementation of graph coverings in our code precisely follows the mathematical constructions in the above definitions: Algorithm 1 Generate Bipartite Double Cover ... Algorithm 2 Generate k-fold Cyclic Cover from Real-world Network |
| Open Source Code | Yes | The code is available at https://github.com/LOGO-CUHKSZ/Graph Abstract |
| Open Datasets | Yes | Traditional benchmarks in domains like molecular prediction, citation networks, and protein interaction graphs inadvertently couple domain-specific node features with topology... To enhance diversity and realism in our generated graphs, we extracted a collection of base graphs from real-world datasets. For MUTAG, we directly utilized the molecular graphs. For Cora, which is a large citation network... [47] Tudataset: A collection of benchmark datasets for learning with graphs. ar Xiv preprint ar Xiv:2007.08663, 2020. |
| Dataset Splits | Yes | Our evaluation includes three test settings of increasing difficulty: ID (In-Distribution) setting uses test graphs containing 20-50 nodes, matching the training distribution. Near-OOD (Near Out-of-Distribution) setting contains graphs with 40-100 nodes, representing a moderate scale shift. Far-OOD (Far Out-of-Distribution) setting features graphs with 60-150 nodes, constituting a significant scale shift. Table 4: Dataset statistics across our four benchmark tasks. Each cell shows the number of graphs followed by the node count range in parentheses. Split Topology Symmetry Spectral Gap Bridge Count Train 3000 2000 3000 2500 (20-50) (30-60) (20-50) (20-50) Val 300 200 300 250 (20-50) (30-60) (20-50) (20-50) Test (ID) 300 600 300 250 (20-50) (30-60) (20-50) (20-50) Test (Near-OOD) 300 600 300 250 (40-100) (50-100) (40-100) (40-100) Test (Far-OOD) 300 600 300 250 (60-150) (70-150) (60-150) (60-150) |
| Hardware Specification | Yes | All experiments are conducted on 4 NVIDIA A800 GPUs. |
| Software Dependencies | No | All datasets are implemented using Py Torch Geometric. We use the Adam optimizer. We use the pynauty1 library to verify the symmetry. The specific version numbers for these software components are not provided. |
| Experiment Setup | Yes | All models are trained with a batch size of 128 for a maximum of 200 epochs, employing early stopping with a patience of 30 epochs to prevent overfitting. We use the Adam optimizer with different learning rates: 1e-5 for vision backbone parameters, 1e-3 for GNN models and classifier heads. Weight decay is set to 1e-4 for vision models. For classification tasks (Topology, Symmetry), we use cross-entropy loss, while for regression tasks (Spectral Gap, Bridge Counting), we employ mean squared error loss. All experiments are conducted on 4 NVIDIA A800 GPUs. For consistent evaluation, we measure accuracy for classification tasks, while regression tasks use Mean Absolute Error (MAE). To ensure reproducibility, we set fixed random seeds [0, 1, 2, 3, 4] for all experiments, controlling the initialization of model parameters, data splitting. For our graph neural network models, we experiment with varying numbers of layers ranging from 2 to 4, with a consistent hidden dimension size of 128 across all architectures. Dropout with a rate of 0.5 is applied throughout the networks to prevent overfitting. For vision-based models, we use standard architectures: Res Net-50, Vi T-B/16, Swin Transformer-Tiny, and Conv Ne Xt V2-Tiny. All models resize graph images to 224 x 224 resolution as input. |