Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

Authors: Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We validate the performance of arch2vec on three commonly used NAS search spaces NAS-Bench-101 [23], NAS-Bench-201 [24] and DARTS [15] and two search strategies based on reinforcement learning (RL) and Bayesian optimization (BO). Our results show that, with the same downstream search strategy, arch2vec consistently outperforms its discrete encoding and supervised architecture representation learning counterparts across all three search spaces.
Researcher Affiliation Academia Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang Michigan State University {yanshen6,zhengy30,aowei,zengxia6,mizhang}@msu.edu
Pseudocode No No pseudocode or algorithm blocks were explicitly labeled or formatted as such.
Open Source Code Yes The implementation of arch2vec is available at https://github.com/MSU-MLSys-Lab/arch2vec.
Open Datasets Yes We validate arch2vec on three commonly used NAS search spaces NAS-Bench-101 [23], NAS-Bench-201 [24] and DARTS [15]
Dataset Splits Yes NAS-Bench-101. ... Each architecture comes with pre-computed validation and test accuracies on CIFAR-10. The cell consists of 7 nodes and can take on any DAG structure from the input to the output with at most 9 edges, with the first node as input and the last node as output. ... We split the dataset into 90% training and 10% held-out test sets for arch2vec pre-training.
Hardware Specification No The paper mentions 'GPU days' in Table 4 but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for experiments.
Software Dependencies No The paper mentions optimizers (Adam) and networks (LSTM, DNGO) but does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup Yes For pre-training, we use a five-layer Graph Isomorphism Network (GIN) with hidden sizes of {128, 128, 128, 128, 16} as the encoder and a one-layer MLP with a hidden dimension of 16 as the decoder. The adjacency matrix is preprocessed as an undirected graph to allow bi-directional information flow. After forwarding the inputs to the model, the reconstruction error is minimized using Adam optimizer [58] with a learning rate of 1 × 10^-3. We train the model with batch size 32 and the training loss is able to converge well after 8 epochs on NAS-Bench-101, and 10 epochs on NAS-Bench-201 and DARTS.