PINAT: A Permutation INvariance Augmented Transformer for NAS Predictor
Authors: Shun Lu, Yu Hu, Peihao Wang, Yan Han, Jianchao Tan, Jixiang Li, Sen Yang, Ji Liu
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments over six public benchmark search spaces. We first compare the ranking ability of PINAT via various train-test data splits on NAS-Bench-101 (Ying et al. 2019) and NAS-Bench-201 (Dong and Yang 2020). |
| Researcher Affiliation | Collaboration | 1 Research Center for Intelligent Computing Systems, Institute of Computing Technology, Chinese Academy of Sciences 2 School of Computer Science and Technology, University of Chinese Academy of Sciences 3 University of Texas at Austin 4 Kuaishou Technology. 5 Snap Inc. 6 Meta Platforms, Inc. |
| Pseudocode | No | The paper contains architectural diagrams and mathematical formulations but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The code is available at https://github.com/Shun Lu91/PINAT. |
| Open Datasets | Yes | We conduct experiments over six public benchmark search spaces. ... NAS-Bench-101 (Ying et al. 2019) and NAS-Bench-201 (Dong and Yang 2020) are widely used benchmarks in recent years. ... CIFAR-10 is a standard image classification dataset... Image Net (Krizhevsky et al. 2017) is a large-scale dataset... Protein-Protein Interactions (PPI) dataset... Model Net (Wu et al. 2015). |
| Dataset Splits | Yes | Both benchmarks provide the validation accuracy and the test accuracy for each architecture, and we utilize the former to train our predictor while using the latter for evaluation. To provide an apples-to-apples comparison, we follow the same data splits in TNASP (Lu et al. 2021), and noted as S1, S2, S3, S4, S5 on NAS-Bench-101 and S 1, S 2, S 3, S 4, S 5 on NAS-Bench-201, detailed in Tab.10 of our Supp." and "evaluate 1k architectures on the validation dataset by inheriting the pre-trained supernet weights to perform efficient inference to get architecture-accuracy pairs for our predictor to learn a mapping relationship. |
| Hardware Specification | No | The paper mentions 'GPU days' as a cost metric but does not provide specific details about the GPU models, CPU models, or any other hardware components used for the experiments. |
| Software Dependencies | No | The paper does not provide specific software dependencies, such as library names with version numbers, needed to replicate the experiments. |
| Experiment Setup | Yes | Specifically, we utilize the uniform sampling method (Guo et al. 2020) to pre-train a supernet on the training set for 120 epochs... we choose the evolutionary algorithm (Deb et al. 2002) to search for optimal architectures for 100 generations and maintain a population of size 100 in each generation. Finally, we re-train our searched architectures with common DARTS strategies, i.e. 600 epochs by the SGD optimizer with the initial learning rate 2.5e-2 and weight decay 3e-4, and use the Cutout (De Vries and Taylor 2017) as the data augmentation. ...we re-train PINAT-S for 450 epochs with the batch size of 320. RMSprop TF optimizer is adopted with the initial learning rate 0.16 and weight decay 1e-5. To prevent over-fitting, we also use the Auto Aug (Cubuk et al. 2019) and RE (Zhong et al. 2020) to augment the training images. ...We randomly sample one path to train the supernet as SPOS (Guo et al. 2020) for 1000 epochs... We retrain this architecture for 2000 epochs with the Adam optimizer... stacking 3 cells and setting k nearest neighbor to 9. |