Surprisingly Strong Performance Prediction with Neural Graph Features
Authors: Gabriela Kadlecová, Jovita Lukasik, Martin Pilát, Petra Vidnerová, Mahmoud Safari, Roman Neruda, Frank Hutter
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our main contributions are summarized as follows: 1. We examine biases in zero-cost proxies (ZCP) and show that some of them directly depend on the number of convolutions. We also demonstrate that they are poor at distinguishing structurally similar networks. 2. We propose neural graph features (GRAF), interpretable features that outperform ZCP in accuracy prediction tasks using a random forest, and yield an even better performance when combined with ZCP. 3. Using GRAF s interpretability, we demonstrate that different tasks favor diverse network properties. 4. We evaluate GRAF on tasks beyond accuracy prediction, and compare with different encodings and predictors. The combination of using ZCP and GRAF as prediction input outperforms most existing methods at a fraction of the compute. |
| Researcher Affiliation | Academia | 1Charles University, Faculty of Mathematics and Physics 2The Czech Academy of Sciences, Institute of Computer Science 3University of Siegen 4University of Freiburg 5ELLIS Institute T ubingen. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | We make the code of our contributions available publicly1 https://github.com/gabikadlecova/zc combine |
| Open Datasets | Yes | In our work, we will use precomputed zero-cost proxy scores from NAS-Bench-Suite-Zero (Krishnakumar et al., 2022). Refer to Section C.3 in the appendix for more details about the provided zero-cost proxies, and Section C.1 for the used benchmarks, datasets, and abbreviations. For NB101 and NB301, zero-cost proxies were computed only for a fraction of the search space, and all subsequent experiments are evaluated on these samples. For NB201 and TNB101-micro, we also exclude networks with unreachable branches due to zero operations. |
| Dataset Splits | Yes | We evaluate the different settings on all available benchmarks and datasets, for 3 train sample sizes (32, 128 and 1024) and across 50 seeds. We report Kendall tau for every run. Full results are available in Section F in the appendix. |
| Hardware Specification | Yes | Table 4. Numbers of GRAF features and time needed for GRAF computation (average of 10 evaluations, across all available networks from the benchmark samples) on AMD Ryzen 7 3800X. [...] A single training and evaluation run on the NB201 benchmark with 32 training networks takes roughly between 35s on an NVIDIA A100-SXM4-40GB and 70s on an NVIDIA Ge Force GTX 1080 Ti. With 128 training networks, it takes similar time between 40s and 70s thanks to the larger batch size. With 1024 training networks, the runs take between 3 minutes on an NVIDIA A100-SXM4-40GB and around 6.5 minutes on an NVIDIA Tesla T4. [...] The total time required to run all 50 repetitions and 4 sample sizes is listed in Table 32 (on Intel Xeon CPU E5-2620, NVIDIA Ge Force GTX 1080 Ti, no networks from the search space were trained since we use precomputed search space data). [...] In total, we used 21 CPU days per each of the 4 tasks, with 32GB RAM and 8 cores allocated per run, on Intel Xeon Gold 6242 CPU. |
| Software Dependencies | No | The paper mentions software like 'scikit-learn' (Pedregosa et al., 2011) and 'XGBoost' (Chen & Guestrin, 2016) but does not provide specific version numbers for these or other libraries/frameworks used. |
| Experiment Setup | Yes | We evaluate the different settings on all available benchmarks and datasets, for 3 train sample sizes (32, 128 and 1024) and across 50 seeds. We report Kendall tau for every run. Full results are available in Section F in the appendix. [...] Table 5. XGB+ hyperparameters: tree method hist, subsample 0.9, n estimators 10000, learning rate 0.01. [...] Table 27. BRP-NAS hyper-parameters: num layers 4, num hidden 600, dropout ratio 0.002, weight init thomas, bias init thomas, epochs 128, learning rate 4e-4, weight decay 5e-5, lr patience 10, batch size 1 (train = 32) or 16 (otherwise), optim name adamw, lr scheduler plateau. |