HardCoRe-NAS: Hard Constrained diffeRentiable Neural Architecture Search
Authors: Niv Nayman, Yonathan Aflalo, Asaf Noy, Lihi Zelnik
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that Hard Co Re-NAS generates state-of-the-art architectures, surpassing other NAS methods, while strictly satisfying the hard resource constraints without any tuning required. (...) We compare our generated architectures to other state-of-the-art NAS methods in Table 1 and Figure 1. |
| Researcher Affiliation | Industry | 1Alibaba Group, Tel Aviv, Israel. |
| Pseudocode | Yes | Algorithm 1 Block Coordinate SFW (BCSFW) |
| Open Source Code | Yes | 1https://github.com/Alibaba-MIIL/ Hard Co Re NAS |
| Open Datasets | Yes | We obtain the solution of the inner problem w as specified in sections 3.3 and 4.2.2 over 80% of a random 80-20 split of the Image Net train set. |
| Dataset Splits | Yes | We obtain the solution of the inner problem w as specified in sections 3.3 and 4.2.2 over 80% of a random 80-20 split of the Image Net train set. We utilize the remaining 20% as a validation set |
| Hardware Specification | Yes | The search is performed according to section 3.4 for only 2 epochs of the validation set, lasting for 8 GPU hours5. (...) Latency measured on Intel Xeon CPU for a batch size of 1. (...) Experiments were performed on two platforms: Intel Xeon CPU and NVIDIA P100 GPU (...) The first 250 epochs took 280 GPU hours4 and the additional 100 fine-tuning epochs took 120 GPU hours5, summing to a total of 400 hours on NVIDIA V100 GPU to obtain w . (Footnotes: 4Running with a batch size of 200 on 8 NVIDIA V100, 5Running with a batch size of 16 on 8 NVIDIA V100) |
| Software Dependencies | No | The paper mentions 'Py Torch implementation' but does not specify its version. It also refers to specific datasets and augmentation policies (e.g., Autoaugment, Cutout) but doesn't list software dependencies with version numbers. |
| Experiment Setup | Yes | For all of our experiments, we train our networks using SGD with a learning rate of 0.1 with cosine annealing, Nesterov momentum of 0.9, weight decay of 10 4, applying label smoothing (Szegedy et al., 2016) of 0.1, cutout, Autoaugment (Cubuk et al., 2018), mixed precision and EMAsmoothing. (...) train for 250 epochs a one-shot model w using the heaviest possible configuration (...) for additional 100 epochs of fine-tuning w over 80% of a 80-20 random split of the Image Net train set (...) Running with a batch size of 200 on 8 NVIDIA V100 (...) Running with a batch size of 16 on 8 NVIDIA V100 |