Network Augmentation for Tiny Deep Learning

Authors: Han Cai, Chuang Gan, Ji Lin, Song Han

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on Image Net (Image Net, Image Net-21k-P) and five fine-grained image classification datasets (Food101, Flowers102, Cars, Cub200, and Pets) show that Net Aug is much more effective than regularization techniques for tiny neural networks.
Researcher Affiliation Collaboration 1Massachusetts Institute of Technology, 2MIT-IBM Watson AI Lab
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks, nor does it have clearly labeled algorithm sections or code-like formatted procedures.
Open Source Code Yes Code and pre-trained weights: https://github.com/mit-han-lab/tinyml
Open Datasets Yes Datasets. We conducted experiments on seven image classification datasets, including Image Net (Deng et al., 2009), Image Net-21K-P (winter21 version) (Ridnik et al., 2021), Food101 (Bossard et al., 2014), Flowers102 (Nilsback & Zisserman, 2008), Cars (Krause et al., 2013), Cub200 (Wah et al., 2011), and Pets (Parkhi et al., 2012). In addition to image classification, we also evaluated our method on Pascal VOC object detection (Everingham et al., 2010) and COCO object detection (Lin et al., 2014).
Dataset Splits Yes The training set consists of Pascal VOC 2007 trainval set and Pascal VOC 2012 trainval set, while Pascal VOC 2007 test set is used for testing.
Hardware Specification No The paper specifies the number of GPUs used for training (e.g., '16 GPUs', '4 GPUs', '8 GPUs') but does not provide specific hardware models like GPU types (e.g., NVIDIA A100, Tesla V100) or CPU details.
Software Dependencies No The paper does not provide specific ancillary software details, such as library names or solver names with version numbers.
Experiment Setup Yes For Image Net experiments, we train models with batch size 2048 using 16 GPUs. We use the SGD optimizer with Nesterov momentum 0.9 and weight decay 4e-5. By default, the models are trained for 150 epochs on Image Net and 20 epochs on Image Net-21K-P, except stated explicitly. The initial learning rate is 0.4 and gradually decreases to 0 following the cosine schedule. Label smoothing is used with a factor of 0.1 on Image Net.