Boost Neural Networks by Checkpoints
Authors: Feng Wang, Guoyizhe Wei, Qiao Liu, Jinxiang Ou, xian wei, Hairong Lv
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical evaluation also indicates our proposed ensemble outperforms single model and existing ensembles in terms of accuracy and efficiency. With the same training budget, our method achieves 4.16% lower error on Cifar-100 and 6.96% on Tiny-Image Net with Res Net-110 architecture. 5 Experiments We compare the effectiveness of CBNN with competitive baselines in this section. All the experiments are conducted on four benchmark datasets: Cifar-10, Cifar-100 [Krizhevsky et al., 2009], Tiny Image Net [Le and Yang, 2015], and Image Net ILSVRC 2012 [Deng et al., 2009]. |
| Researcher Affiliation | Academia | Feng Wang1, Guoyizhe Wei1, Qiao Liu2, Jinxiang Ou1, Xian Wei3, Hairong Lv1,4 1Department of Automation, Tsinghua University 2Department of Statistics, Stanford University 3Software Engineering Institute, East China Normal University 4Fuzhou Institute of Data Technology |
| Pseudocode | Yes | Algorithm 1 Checkpoint-Boosted Neural Networks |
| Open Source Code | No | The paper does not provide any specific links to a code repository or explicit statements about the release of source code. |
| Open Datasets | Yes | All the experiments are conducted on four benchmark datasets: Cifar-10, Cifar-100 [Krizhevsky et al., 2009], Tiny Image Net [Le and Yang, 2015], and Image Net ILSVRC 2012 [Deng et al., 2009]. |
| Dataset Splits | Yes | All the experiments are conducted on four benchmark datasets: Cifar-10, Cifar-100 [Krizhevsky et al., 2009], Tiny Image Net [Le and Yang, 2015], and Image Net ILSVRC 2012 [Deng et al., 2009]. |
| Hardware Specification | Yes | Table 2 summarizes the time consumption of different methods on Nvidia Tesla P40 GPUs |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, libraries, or frameworks used in the experiments. |
| Experiment Setup | Yes | 5.1 Experiment Setup ... All the models are trained with 0.2 dropout rate (0.3 for Efficient Net-B3) and images are augmented by Auto Augment [Cubuk et al., 2019]. ... adopting a standard decaying learning rate that is initialized to 0.05 and drops by 96% every two epochs with five epochs warmup [Gotmare et al., 2018]. ... In Snapshot Ensemble, the learning rate scheduling rules follow [Huang et al., 2017a] and we set α = 0.2, which achieves better performance in our experiments. Similarly, we set r1 = 0.1, r2 = 0.5, p1 = 2 and p2 = 6 for Snapshot Boosting. ... and set α1 = 5 × 10−2, α2 = 5 × 10−4 to all the datasets and DNN architectures. Our method, CBNN adopts the learning rate used in training Single Model as well, and setting η = 0.01. ... train the DNNs from scratch for 200 epochs on Cifar-10, Cifar-100, Tiny-Image Net and 300 epochs on Image Net. We save six checkpoint models for SSE and FGE... |