reproducibilityindex.ai

HardCoRe-NAS: Hard Constrained diffeRentiable Neural Architecture Search

Authors: Niv Nayman, Yonathan Aflalo, Asaf Noy, Lihi Zelnik

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that Hard Co Re-NAS generates state-of-the-art architectures, surpassing other NAS methods, while strictly satisfying the hard resource constraints without any tuning required. (...) We compare our generated architectures to other state-of-the-art NAS methods in Table 1 and Figure 1.
Researcher Affiliation	Industry	1Alibaba Group, Tel Aviv, Israel.
Pseudocode	Yes	Algorithm 1 Block Coordinate SFW (BCSFW)
Open Source Code	Yes	1https://github.com/Alibaba-MIIL/ Hard Co Re NAS
Open Datasets	Yes	We obtain the solution of the inner problem w as speciﬁed in sections 3.3 and 4.2.2 over 80% of a random 80-20 split of the Image Net train set.
Dataset Splits	Yes	We obtain the solution of the inner problem w as speciﬁed in sections 3.3 and 4.2.2 over 80% of a random 80-20 split of the Image Net train set. We utilize the remaining 20% as a validation set
Hardware Specification	Yes	The search is performed according to section 3.4 for only 2 epochs of the validation set, lasting for 8 GPU hours5. (...) Latency measured on Intel Xeon CPU for a batch size of 1. (...) Experiments were performed on two platforms: Intel Xeon CPU and NVIDIA P100 GPU (...) The ﬁrst 250 epochs took 280 GPU hours4 and the additional 100 ﬁne-tuning epochs took 120 GPU hours5, summing to a total of 400 hours on NVIDIA V100 GPU to obtain w . (Footnotes: 4Running with a batch size of 200 on 8 NVIDIA V100, 5Running with a batch size of 16 on 8 NVIDIA V100)
Software Dependencies	No	The paper mentions 'Py Torch implementation' but does not specify its version. It also refers to specific datasets and augmentation policies (e.g., Autoaugment, Cutout) but doesn't list software dependencies with version numbers.
Experiment Setup	Yes	For all of our experiments, we train our networks using SGD with a learning rate of 0.1 with cosine annealing, Nesterov momentum of 0.9, weight decay of 10 4, applying label smoothing (Szegedy et al., 2016) of 0.1, cutout, Autoaugment (Cubuk et al., 2018), mixed precision and EMAsmoothing. (...) train for 250 epochs a one-shot model w using the heaviest possible conﬁguration (...) for additional 100 epochs of ﬁne-tuning w over 80% of a 80-20 random split of the Image Net train set (...) Running with a batch size of 200 on 8 NVIDIA V100 (...) Running with a batch size of 16 on 8 NVIDIA V100