Learnable Embedding Space for Efficient Neural Architecture Compression

Authors: Shengcao Cao, Xiaofang Wang, Kris M. Kitani

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that our search algorithm can significantly outperform various baseline methods, such as random search and reinforcement learning (Ashok et al., 2018). The compressed architectures found by our method are also better than the state-of-the-art manually-designed compact architecture Shuffle Net (Zhang et al., 2018). We first extensively evaluate our algorithm with different teacher architectures and datasets. We then compare the automatically found compressed architectures to the state-of-the-art manually-designed compact architecture, Shuffle Net (Zhang et al., 2018). We also evaluate the transfer performance of the learned embedding space and kernel. We perform ablation study to understand how the number of kernels K and other design choices in our search algorithm influence the performance.
Researcher Affiliation Academia Shengcao Cao School of EECS Peking University Beijing, 100871, China caoshengcao@pku.edu.cn Xiaofang Wang & Kris M. Kitani The Robotics Institute Carnegie Mellon University Pittsburgh, PA 15213, USA {xiaofan2,kkitani}@cs.cmu.edu
Pseudocode Yes A formal sketch of our search algorithm in shown Algorithm 1.
Open Source Code No The paper does not contain an explicit statement about releasing its source code or a link to a code repository for the methodology described.
Open Datasets Yes We use two datasets: CIFAR-10 and CIFAR-100 (Krizhevsky & Hinton, 2009). CIFAR-10 contains 60K images in 10 classes, with 6K images per class. CIFAR-100 also contains 60K images but in 100 classes, with 600 images per class. Both CIFAR-10 and CIFAR-100 are divided into a training set with 50K images and a test set with 10K images.
Dataset Splits Yes We sample 5K images from the training set as the validation set.
Hardware Specification No The paper mentions 'a single GPU' generally but does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers.
Experiment Setup Yes When evaluating an architecture during the search process, we only train it for 10 epochs to reduce computation time. So for both RS and our method, we fully train the top 4 architectures among the 160 evaluated architectures and choose the best one as the solution. For our proposed method, we run 20 architecture search steps, where each step generates K = 8 architectures for evaluation based on the the K different kernels. When learning the kernel function parameters, we randomly sample from the set of the evaluated architectures with a probability of 0.5 to form the training set for one kernel.