Boost Neural Networks by Checkpoints

Authors: Feng Wang, Guoyizhe Wei, Qiao Liu, Jinxiang Ou, xian wei, Hairong Lv

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical evaluation also indicates our proposed ensemble outperforms single model and existing ensembles in terms of accuracy and efficiency. With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-Image Net with Res Net-110 architecture. 5 Experiments We compare the effectiveness of CBNN with competitive baselines in this section. All the experiments are conducted on four benchmark datasets: Cifar-10, Cifar-100 [Krizhevsky et al., 2009], Tiny Image Net [Le and Yang, 2015], and Image Net ILSVRC 2012 [Deng et al., 2009].
Researcher Affiliation Academia Feng Wang1, Guoyizhe Wei1, Qiao Liu2, Jinxiang Ou1, Xian Wei3, Hairong Lv1,4 1Department of Automation, Tsinghua University 2Department of Statistics, Stanford University 3Software Engineering Institute, East China Normal University 4Fuzhou Institute of Data Technology
Pseudocode Yes Algorithm 1 Checkpoint-Boosted Neural Networks
Open Source Code No The paper does not provide any specific links to a code repository or explicit statements about the release of source code.
Open Datasets Yes All the experiments are conducted on four benchmark datasets: Cifar-10, Cifar-100 [Krizhevsky et al., 2009], Tiny Image Net [Le and Yang, 2015], and Image Net ILSVRC 2012 [Deng et al., 2009].
Dataset Splits Yes All the experiments are conducted on four benchmark datasets: Cifar-10, Cifar-100 [Krizhevsky et al., 2009], Tiny Image Net [Le and Yang, 2015], and Image Net ILSVRC 2012 [Deng et al., 2009].
Hardware Specification Yes Table 2 summarizes the time consumption of different methods on Nvidia Tesla P40 GPUs
Software Dependencies No The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments.
Experiment Setup Yes 5.1 Experiment Setup ... All the models are trained with 0.2 dropout rate (0.3 for Efficient Net-B3) and images are augmented by Auto Augment [Cubuk et al., 2019]. ... adopting a standard decaying learning rate that is initialized to 0.05 and drops by 96% every two epochs with five epochs warmup [Gotmare et al., 2018]. ... In Snapshot Ensemble, the learning rate scheduling rules follow [Huang et al., 2017a] and we set α = 0.2, which achieves better performance in our experiments. Similarly, we set r1 = 0.1, r2 = 0.5, p1 = 2 and p2 = 6 for Snapshot Boosting. ... and set α1 = 5 × 10−2, α2 = 5 × 10−4 to all the datasets and DNN architectures. Our method, CBNN adopts the learning rate used in training Single Model as well, and setting η = 0.01. ... train the DNNs from scratch for 200 epochs on Cifar-10, Cifar-100, Tiny-Image Net and 300 epochs on Image Net. We save six checkpoint models for SSE and FGE...