Towards Oracle Knowledge Distillation with Neural Architecture Search
Authors: Minsoo Kang, Jonghwan Mun, Bohyung Han4404-4411
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform extensive experiments on the image classification datasets CIFAR-100 and Tiny Image Net using various networks. |
| Researcher Affiliation | Academia | 1Computer Vision Lab., ASRI, Seoul National University, Korea 2Computer Vision Lab., POSTECH, Korea 3Neural Processing Research Center (NPRC), Seoul National University, Korea 1{kminsoo, bhhan}@snu.ac.kr 2jonghwan.mun@postech.ac.kr |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. It presents mathematical formulations but not in a structured algorithm format. |
| Open Source Code | No | The paper mentions employing 'publicly available ENAS (Pham et al. 2018) code1 for neural architecture search implementation in Tensor Flow' and provides a link to that third-party repository (https://github.com/melodyguan/enas). It does not explicitly state that their own implementation code for the described methodology is open-source or provide a link to it. |
| Open Datasets | Yes | We evaluate our algorithm on the image classification task using CIFAR-100 and Tiny Image Net datasets. CIFAR-100 dataset (Krizhevsky 2009) is composed of 50,000 training and 10,000 testing images in 100 classes |
| Dataset Splits | Yes | For architecture search, 10% of training images are held out as training-validation set to compute reward. |
| Hardware Specification | No | The paper does not explicitly describe the specific hardware (e.g., GPU models, CPU types, or cloud compute instances with specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions using 'Tensor Flow (Abadi et al. 2016)' and 'Py Torch (Paszke et al. 2017)' but does not specify their version numbers or any other software dependencies with versions. |
| Experiment Setup | Yes | we optimize the networks for 300 epochs using SGD with Nesterov momentum (Sutskever et al. 2013) of 0.9, a weight decay of 0.0001 and a batch size of 128. Following (lan, Zhu, and Gong 2018), the initial learning rate is set to 0.1, and is divided by 10 at 150th and 225th epoch, respectively. We also perform warm-up strategy (He et al. 2016) with learning rate of 0.01 with Res Net-110 until 400th and 900th iterations for CIFAR-100 and Tiny Imagenet datasets, respectively. For KD and OD, a temperature T is fixed 3 and a balancing factor λ is set to 0. |