Saliency-Aware Neural Architecture Search

Authors: Ramtin Hosseini, Pengtao Xie

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on several datasets demonstrate the effectiveness of our framework. In this section, we present experimental results.
Researcher Affiliation Academia Ramtin Hosseini and Pengtao Xie UC San Diego rhossein@eng.ucsd.edu, p1xie@eng.ucsd.edu
Pseudocode No The paper describes its method in prose, including a four-stage optimization framework and an optimization algorithm, but does not present any pseudocode or clearly labeled algorithm blocks.
Open Source Code No Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The data is publicly available. The code is proprietary.
Open Datasets Yes Datasets We used three datasets: CIFAR-10 [35], CIFAR-100 [36], and Image Net [15].
Dataset Splits Yes For each of them, we split it into a train, validation, and test set with 25K, 25K, and 10K images respectively. Following [66], we randomly sample 10% of the 1.2M images to form a new training set and another 2.5% to form a validation set, then perform a search on them.
Hardware Specification Yes search cost (GPU days on a Nvidia 1080Ti). Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] They are included in the supplements.
Software Dependencies No The paper does not explicitly specify software dependencies with version numbers (e.g., specific libraries or frameworks like PyTorch or TensorFlow with their versions).
Experiment Setup Yes The tradeoff parameter γ is set to 2. The norm bound ε of perturbations is set to 0.03. ... with a batch size of 64, an initial learning rate of 0.025 with cosine scheduling, an epoch number of 50, a weight decay of 3e-4, and a momentum of 0.9. We optimize weight parameters using SGD. The initial learning rate is set to 2e 2. It is annealed using a cosine scheduler. The momentum is set to 0.9. We use Adam [34] to optimize the architecture variables. The learning rate is set to 3e 4 and weight decay is set to 1e 3.