reproducibilityindex.ai

Does Unsupervised Architecture Representation Learning Help Neural Architecture Search?

Authors: Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the performance of arch2vec on three commonly used NAS search spaces NAS-Bench-101 [23], NAS-Bench-201 [24] and DARTS [15] and two search strategies based on reinforcement learning (RL) and Bayesian optimization (BO). Our results show that, with the same downstream search strategy, arch2vec consistently outperforms its discrete encoding and supervised architecture representation learning counterparts across all three search spaces.
Researcher Affiliation	Academia	Shen Yan, Yu Zheng, Wei Ao, Xiao Zeng, Mi Zhang Michigan State University {yanshen6,zhengy30,aowei,zengxia6,mizhang}@msu.edu
Pseudocode	No	No pseudocode or algorithm blocks were explicitly labeled or formatted as such.
Open Source Code	Yes	The implementation of arch2vec is available at https://github.com/MSU-MLSys-Lab/arch2vec.
Open Datasets	Yes	We validate arch2vec on three commonly used NAS search spaces NAS-Bench-101 [23], NAS-Bench-201 [24] and DARTS [15]
Dataset Splits	Yes	NAS-Bench-101. ... Each architecture comes with pre-computed validation and test accuracies on CIFAR-10. The cell consists of 7 nodes and can take on any DAG structure from the input to the output with at most 9 edges, with the ﬁrst node as input and the last node as output. ... We split the dataset into 90% training and 10% held-out test sets for arch2vec pre-training.
Hardware Specification	No	The paper mentions 'GPU days' in Table 4 but does not specify any particular GPU models, CPU models, or other detailed hardware specifications used for experiments.
Software Dependencies	No	The paper mentions optimizers (Adam) and networks (LSTM, DNGO) but does not provide specific software dependency versions (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	For pre-training, we use a ﬁve-layer Graph Isomorphism Network (GIN) with hidden sizes of {128, 128, 128, 128, 16} as the encoder and a one-layer MLP with a hidden dimension of 16 as the decoder. The adjacency matrix is preprocessed as an undirected graph to allow bi-directional information ﬂow. After forwarding the inputs to the model, the reconstruction error is minimized using Adam optimizer [58] with a learning rate of 1 × 10^-3. We train the model with batch size 32 and the training loss is able to converge well after 8 epochs on NAS-Bench-101, and 10 epochs on NAS-Bench-201 and DARTS.