AutoGO: Automated Computation Graph Optimization for Neural Network Evolution
Authors: Mohammad Salameh, Keith Mills, Negar Hassanpour, Fred Han, Shuting Zhang, Wei Lu, Shangling Jui, CHUNHUA ZHOU, Fengyu Sun, Di Niu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experimental results show that Auto GO can automatically evolve several typical large convolutional networks to achieve significant task performance improvement and FLOPs reduction on a range of CV tasks, ranging from Classification, Semantic Segmentation, Human Pose Estimation, to Super Resolution, yet without introducing any newer primitive operations. We also demonstrate the lightweight deployment results of Auto GOoptimized super-resolution and denoising U-Nets on a cycle simulator for a Neural Processing Unit (NPU), achieving PSNR improvement and latency/power reduction simultaneously. |
| Researcher Affiliation | Collaboration | 1Huawei Technologies Canada. 2Dept. ECE, University of Alberta. 3Huawei Kirin Solution, China. |
| Pseudocode | Yes | Algorithm 1 Sample Auto GO pseudocode for one iteration |
| Open Source Code | Yes | Code available at https://github.com/Ascend-Research/Auto GO. |
| Open Datasets | Yes | We construct our database by extracting segments from 5 CIFAR-10 [33] benchmark families: NASBench-101 [71], NAS-Bench-201 [17], Hi AML, Inception, and Two-Path [48].", "We train each network on Image Net [58]. Then, we fine-tune the network on different tasks. For Semantic Segmentation (SS), we use a PSPNet [76] head structure and fine-tune on Cityscapes [14] to obtain mean Intersection over Union (m Io U) performance. For Human Pose Estimation (HPE), we adopt the method of [78] to fine-tune on MPII [4] to measure the Percentage of Correct Keypoints (PCK) of an architecture. |
| Dataset Splits | Yes | We split each family into training, validation, and testing partitions containing 80%, 10% and 10% of the overall CGs in that family. |
| Hardware Specification | Yes | We run our experiments on rack servers using Intel Xeon Gold 6140 CPUs. Each server is equipped with 8 NVIDIA V100 32GB GPUs and 756GB RAM. ... We measure latency on an Nvidia RTX 2080 Ti GPU... |
| Software Dependencies | Yes | We execute our search and experiments on Python 3 using Py Torch==1.8.1 and Tensor Flow==1.15.0. We implement our predictors using Py Torch-Geometric==1.7.1. We use Sentence Piece [34] to perform BPE. Finally, we implement our MILP using a Coin-CBC solver [18] and pyomo==6.4.0 [23]. |
| Experiment Setup | Yes | We train our predictors for 40 epochs with a batch size of 32 and an initial learning rate of 1e 4. ... We evaluate CIFAR-10 networks by training them 3 times for 200 epochs with a batch size of 256. We optimize the models using RMSProp with an initial learning rate of 1e 3 and a momentum factor of 0.9. We anneal the learning rate according to a cosine schedule. |