Meta-prediction Model for Distillation-Aware NAS on Unseen Datasets
Authors: Hayeon Lee, Sohyun An, Minseon Kim, Sung Ju Hwang
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The experimental results demonstrate that our proposed meta-prediction model successfully generalizes to multiple unseen datasets for Da NAS tasks, largely outperforming existing meta-NAS methods and rapid NAS baselines. We meta-learn the proposed distillation-aware prediction model on the subsets of Tiny Image Net and neural architectures from the Res Net search space. Then we validate its prediction performance on heterogeneous unseen datasets such as CUB, Stanford Cars, DTD, Quickdraw, Crop Disease, Euro SAT, ISIC, Chest X, and Image Net-1K. |
| Researcher Affiliation | Academia | Hayeon Lee , Sohyun An , Minseon Kim, Sung Ju Hwang KAIST, South Korea {hayeon926, sohyunan, minseonkim, sjhwang82}@kaist.ac.kr |
| Pseudocode | No | The paper describes methods through equations and text but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code is available at https://github.com/Cownow An/Da SS. |
| Open Datasets | Yes | We use Tiny Image Net (Le & Yang, 2015) as the source dataset for meta-training. We covered 9 datasets, including fine-grained datasets (CUB (Wah et al., 2011), Stanford Cars (Krause et al., 2013) (Cars), Crop Disease (Guo et al., 2020) agriculture images), out-of-distribution datasets (DTD (Cimpoi et al., 2014) texture images, Quick Draw (Jonas et al.) (Draw) black-and-white drawing images, Euro SAT (Guo et al., 2020) satellite images, ISIC (Guo et al., 2020) medical images about skin lesions, Chest X (Guo et al., 2020) X-ray images), and large-scale dataset Image Net-1K (Russakovsky et al., 2015). |
| Dataset Splits | Yes | We use subsets consisting of 8 and 2 splits for meta-training and meta-validation, respectively. For each task, we consider the dataset-teacher pair while randomly selecting distilled student architecture-accuracy pair to train meta-prediction models. Each task consists of a specific teacher network trained on a specific dataset, the accuracy of the teacher network measured on the dataset, 50 student architecture candidates, and their actual accuracy obtained by distilling the knowledge from the teacher network to each student architecture. |
| Hardware Specification | Yes | In all cases, the time taken was measured on a single NVIDIA s RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., PyTorch 1.x, Python 3.x, CUDA 11.x) for reproducing the experiments. |
| Experiment Setup | Yes | For every knowledge distillation process, we distill the knowledge of the teacher network to the student by leveraging the KD-aware loss, LKD,α, for 50 epochs with a learning rate of 5e 2 and we set the value of α as 0.5. When conducting the inner gradient updates with the teacher-accuracy pair, the total number of inner gradient steps is set to 1. The search space in this Section is defined by Cai et al. (2020), and it is based on more depth and channel widths compared to previous experiments. |