DiffusionNAG: Predictor-guided Neural Architecture Generation with Diffusion Models
Authors: Sohyun An, Hayeon Lee, Jaehyeong Jo, Seanie Lee, Sung Ju Hwang
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of Diffusion NAG through extensive experiments in two predictor-based NAS scenarios: Transferable NAS and Bayesian Optimization (BO)-based NAS. Diffusion NAG achieves superior performance with speedups of up to 35 when compared to the baselines on Transferable NAS benchmarks. Furthermore, when integrated into a BO-based algorithm, Diffusion NAG outperforms existing BO-based NAS approaches, particularly in the large Mobile Net V3 search space on the Image Net 1K dataset. Code is available at https://github.com/Cownow An/Diffusion NAG. |
| Researcher Affiliation | Collaboration | KAIST1, Deep Auto.ai2, Seoul, South Korea {sohyunan, hayeon926, harryjo97, lsnfamily02, sjhwang82}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1: General Bayesian Optimization NAS and Algorithm 2: Bayesian Optimization with Diffusion NAG are provided in Appendix C.6. |
| Open Source Code | Yes | Code is available at https://github.com/Cownow An/Diffusion NAG. |
| Open Datasets | Yes | We evaluate our approach on four datasets following Lee et al. (2021a): CIFAR-10 (Krizhevsky, 2009), CIFAR100 (Krizhevsky, 2009), Aircraft (Maji et al., 2013), and Oxford IIT Pets (Parkhi et al., 2012) |
| Dataset Splits | Yes | For CIFAR-10 and CIFAR-100, we use the predefined splits from the NAS-Bench-201 benchmark. For Aircraft and Oxford-IIIT Pets, we create random validation and test splits by dividing the test set into two equal-sized subsets. |
| Hardware Specification | Yes | The training process required 21.33 GPU hours (MBv3) and 3.43 GPU hours (NB201) on Tesla V100-SXM2, respectively. Our generation process, with a sampling batch size of 256, takes up to 2.02 GPU minutes on Tesla V100-SXM2 to sample one batch. |
| Software Dependencies | No | The paper mentions software components like |
| Experiment Setup | Yes | Following the training pipeline presented in Dong & Yang (2020b), we train each architecture using SGD with Nesterov momentum and employ the cross-entropy loss for 200 epochs. For regularization, we set the weight decay to 0.0005 and decay the learning rate from 0.1 to 0 using a cosine annealing schedule (Loshchilov & Hutter, 2016). We maintain consistency by utilizing the same set of hyperparameters across different datasets. |