Saliency-Aware Neural Architecture Search
Authors: Ramtin Hosseini, Pengtao Xie
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on several datasets demonstrate the effectiveness of our framework. In this section, we present experimental results. |
| Researcher Affiliation | Academia | Ramtin Hosseini and Pengtao Xie UC San Diego rhossein@eng.ucsd.edu, p1xie@eng.ucsd.edu |
| Pseudocode | No | The paper describes its method in prose, including a four-stage optimization framework and an optimization algorithm, but does not present any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | No | Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)? [No] The data is publicly available. The code is proprietary. |
| Open Datasets | Yes | Datasets We used three datasets: CIFAR-10 [35], CIFAR-100 [36], and Image Net [15]. |
| Dataset Splits | Yes | For each of them, we split it into a train, validation, and test set with 25K, 25K, and 10K images respectively. Following [66], we randomly sample 10% of the 1.2M images to form a new training set and another 2.5% to form a validation set, then perform a search on them. |
| Hardware Specification | Yes | search cost (GPU days on a Nvidia 1080Ti). Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] They are included in the supplements. |
| Software Dependencies | No | The paper does not explicitly specify software dependencies with version numbers (e.g., specific libraries or frameworks like PyTorch or TensorFlow with their versions). |
| Experiment Setup | Yes | The tradeoff parameter γ is set to 2. The norm bound ε of perturbations is set to 0.03. ... with a batch size of 64, an initial learning rate of 0.025 with cosine scheduling, an epoch number of 50, a weight decay of 3e-4, and a momentum of 0.9. We optimize weight parameters using SGD. The initial learning rate is set to 2e 2. It is annealed using a cosine scheduler. The momentum is set to 0.9. We use Adam [34] to optimize the architecture variables. The learning rate is set to 3e 4 and weight decay is set to 1e 3. |