HardCoRe-NAS: Hard Constrained diffeRentiable Neural Architecture Search

Authors: Niv Nayman, Yonathan Aflalo, Asaf Noy, Lihi Zelnik

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that Hard Co Re-NAS generates state-of-the-art architectures, surpassing other NAS methods, while strictly satisfying the hard resource constraints without any tuning required. (...) We compare our generated architectures to other state-of-the-art NAS methods in Table 1 and Figure 1.
Researcher Affiliation Industry 1Alibaba Group, Tel Aviv, Israel.
Pseudocode Yes Algorithm 1 Block Coordinate SFW (BCSFW)
Open Source Code Yes 1https://github.com/Alibaba-MIIL/ Hard Co Re NAS
Open Datasets Yes We obtain the solution of the inner problem w as specified in sections 3.3 and 4.2.2 over 80% of a random 80-20 split of the Image Net train set.
Dataset Splits Yes We obtain the solution of the inner problem w as specified in sections 3.3 and 4.2.2 over 80% of a random 80-20 split of the Image Net train set. We utilize the remaining 20% as a validation set
Hardware Specification Yes The search is performed according to section 3.4 for only 2 epochs of the validation set, lasting for 8 GPU hours5. (...) Latency measured on Intel Xeon CPU for a batch size of 1. (...) Experiments were performed on two platforms: Intel Xeon CPU and NVIDIA P100 GPU (...) The first 250 epochs took 280 GPU hours4 and the additional 100 fine-tuning epochs took 120 GPU hours5, summing to a total of 400 hours on NVIDIA V100 GPU to obtain w . (Footnotes: 4Running with a batch size of 200 on 8 NVIDIA V100, 5Running with a batch size of 16 on 8 NVIDIA V100)
Software Dependencies No The paper mentions 'Py Torch implementation' but does not specify its version. It also refers to specific datasets and augmentation policies (e.g., Autoaugment, Cutout) but doesn't list software dependencies with version numbers.
Experiment Setup Yes For all of our experiments, we train our networks using SGD with a learning rate of 0.1 with cosine annealing, Nesterov momentum of 0.9, weight decay of 10 4, applying label smoothing (Szegedy et al., 2016) of 0.1, cutout, Autoaugment (Cubuk et al., 2018), mixed precision and EMAsmoothing. (...) train for 250 epochs a one-shot model w using the heaviest possible configuration (...) for additional 100 epochs of fine-tuning w over 80% of a 80-20 random split of the Image Net train set (...) Running with a batch size of 200 on 8 NVIDIA V100 (...) Running with a batch size of 16 on 8 NVIDIA V100