Interpretable Neural Architecture Search via Bayesian Optimisation with Weisfeiler-Lehman Kernels
Authors: Binxin Ru, Xingchen Wan, Xiaowen Dong, Michael Osborne
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate empirically that our surrogate model is capable of identifying useful motifs which can guide the generation of new architectures. We finally show that our method outperforms existing NAS approaches to achieve the state of the art on both closedand open-domain search spaces. |
| Researcher Affiliation | Academia | Binxin Ru , Xingchen Wan , Xiaowen Dong, Michael A. Osborne Machine Learning Research Group University of Oxford, UK {robin, xwan, xdong, mosb}@robots.ox.ac.uk |
| Pseudocode | Yes | Algorithm 1 NAS-BOWL Algorithm. Optional steps of the exemplary use of motifbased warm starting (Sec 3.2) are marked in gray italics. |
| Open Source Code | Yes | Codes are available at https://github.com/xingchenwan/nasbowl |
| Open Datasets | Yes | NAS-Bench-101 (Ying et al., 2019): ... The dataset and its API can be downloaded from https://github.com/google-research/nasbench/. NAS-Bench-201 (Dong and Yang, 2020): ... The dataset and its API can be downloaded from https://github.com/D-X-Y/NAS-Bench-201. Flowers102: ... trained on the Flowers102 dataset (Nilsback and Zisserman, 2008) |
| Dataset Splits | Yes | NAS-Bench-101: ... We can access the final training/validation/test accuracy... NAS-Bench-201: ... We can access the training accuracy/loss, validation accuracy/loss after every training epoch, the final test accuracy/loss... During the architecture search, we use half of the CIFAR-10 training data and leave the other half as the validation set. |
| Hardware Specification | Yes | All experiments were conducted on a 36-core 2.3GHz Intel Xeon processor with 512 GB RAM. All experiments were conducted on a machine with an Intel Xeon-W processor with 64 GB RAM and a single NVIDIA Ge Force RTX 2080 Ti GPU with 11 GB VRAM. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies like programming languages or libraries, only mentioning the use of existing codebases. |
| Experiment Setup | Yes | We use a batch size B = 5 (i.e., at each BO iteration, architectures yielding top 5 acquisition function values are selected to be evaluated in parallel). When mutation algorithm described in Sec. 3.2 is used, we use a pool size of P = 200... We always use 10 random samples to initialise NAS-BOWL... We use SGD optimiser with momentum of 0.9, weight decay of 3 10 4 and an initial learning rate of 0.025 which is cosine annealed to zero over 50 epochs. |