HW-NAS-Bench: Hardware-Aware Neural Architecture Search Benchmark
Authors: Chaojian Li, Zhongzhi Yu, Yonggan Fu, Yongan Zhang, Yang Zhao, Haoran You, Qixuan Yu, Yue Wang, Cong Hao, Yingyan Lin
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To this end, we develop HW-NAS-Bench, the first public dataset for HW-NAS research which aims to democratize HW-NAS research to non-hardware experts and make HW-NAS research more reproducible and accessible. To design HW-NAS-Bench, we carefully collected the measured/estimated hardware performance (e.g., energy cost and latency) of all the networks in the search spaces of both NAS-Bench-201 and FBNet, on six hardware devices that fall into three categories (i.e., commercial edge devices, FPGA, and ASIC). Furthermore, we provide a comprehensive analysis of the collected measurements in HW-NAS-Bench to provide insights for HW-NAS research. Finally, we demonstrate exemplary user cases to (1) show that HW-NAS-Bench allows non-hardware experts to perform HW-NAS by simply querying our premeasured dataset and (2) verify that dedicated device-specific HW-NAS can indeed lead to optimal accuracy-cost trade-offs. |
| Researcher Affiliation | Academia | Department of Electrical and Computer Engineering Rice University {cl114,zy42,yf22,yz87,zy34,hy34,qy12,yw68,yingyan.lin}@rice.edu |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | The codes and all collected data are available at https://github.com/RICE-EIC/HW-NAS-Bench. |
| Open Datasets | Yes | NAS-Bench-201 further extends NAS-Bench-101 to support more NAS algorithm categories (e.g., differentiable algorithms) and more datasets (e.g., CIFAR-100 (Krizhevsky et al., 2009) and Image Net16-120 (Chrabaszcz et al., 2017)). |
| Dataset Splits | No | The paper refers to using standard datasets and their associated ground truth accuracies (e.g., from NAS-Bench-201), but it does not explicitly state the train/validation/test splits for its own experimental methodology or for the Proxyless NAS user case. It states 'Training log and accuracy are provided for each architecture' within NAS-Bench-201, but not the specific splits used in the paper's experiments. |
| Hardware Specification | Yes | Edge GPU: NVIDIA Edge GPU Jetson TX2 (Edge GPU) is a commercial device with a 256-core Pascal GPU and a 8GB LPDDR4, targeting IoT applications (NVIDIA Inc., a). When plugging an Edge GPU into the above hardware-cost collection pipeline, we first compile the network architectures in both NAS-Bench-201 and FBNet spaces to (1) convert them to the Tensor RT format and (2) optimize the inference implementation within NVIDIA s recommended Tensor RT runtime environment, and then execute them in the Edge GPU to measure the consumed energy and latency. Raspi 4: Raspberry Pi 4 (Raspi 4) is the latest Raspberry Pi device (Raspberry Pi Limited.), consisting of a Broadcom BCM2711 SoC and a 4GB LPDDR4. Edge TPU: An Edge TPU Dev Board (Edge TPU) (Google LLC., a) is a dedicated ASIC accelerator developed by Google, targeting Artificial Intelligence (AI) inference for edge applications. Pixel 3: Pixel 3 is one of the latest Pixel mobile phones (Google LLC., e), which are widely used as the target platforms by recent NAS works (Xiong et al., 2020; Howard et al., 2019; Tan et al., 2019). ASIC-Eyeriss: For collecting the hardware-cost data in ASIC, we consider a SOTA ASIC accelerator, Eyeriss (Chen et al., 2016). FPGA: ...obtain the hardware-cost on a Xilinx ZC706 board with a Zynq XC7045 SoC (Xilinx Inc., b). |
| Software Dependencies | No | The paper mentions software like Tensor Flow, Py Torch, Tensor Flow Lite, Tensor RT, Keras, and Vivado HLS toolflow, but it does not provide specific version numbers for any of these. For example, 'Tensor Flow (Abadi et al., 2016)' cites the original paper but does not state the version used in their experiments. |
| Experiment Setup | Yes | For example, for collecting the hardware-cost in an Edge GPU, we first set the device in the Max-N mode to fully make use of all available resources following (Wofk et al., 2019), and then set up the embedded power rail monitor (Texas Instruments Inc.) to obtain the real-measured latency and energy via sysfs (Patrick Mochel and Mike Murphy.), averaging over 50 runs. We pre-set the Edge GPU to the max-N mode to make full use of the resource on it following (Wofk et al., 2019). when configuring the Pixel 3 device to only use its big cores for reducing the measurement variance as in (Xiong et al., 2020; Tan et al., 2019). |