reproducibilityindex.ai

SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning

Authors: Myeongjang Pyeon, Jihwan Moon, Taeyoung Hahn, Gunhee Kim

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our algorithm can consistently discover transferable decoupled architectures for VGG and Res Net variants, and signiﬁcantly outperforms the ones trained with end-to-end backpropagation and other state-of-the-art greedy-leaning methods in CIFAR-10, Tiny-Image Net and Image Net. We experiment the proposed SEDONA in two stages of search and evaluation. In the search stage, SEDONA searches for the best decoupling conﬁguration for a given neural network on CIFAR10 to minimize the validation loss. In the evaluation stage, we split the networks according to the searched conﬁguration, and evaluate their greedy block-wise learning performance for classiﬁcation in CIFAR-10 (Krizhevsky & Hinton, 2009), Tiny-Image Net1 and Image Net (Russakovsky et al., 2015).
Researcher Affiliation	Academia	Myeongjang Pyeon, Jihwan Moon, Taeyoung Hahn, and Gunhee Kim Seoul National University, Seoul, Korea
Pseudocode	Yes	Algorithm 1: SEDONA Searching for Decoupled Neural Architectures
Open Source Code	No	For Pred Sim, DGL and Features Replay implementations, we refer to their ofﬁcial Py Torch implementations. Pred Sim: https://github.com/anokland/local-loss, DGL: https://github.com/ eugenium/DGL, Features Replay: https://github.com/slowbull/Features Replay. The paper does not state that the code for SEDONA itself is open source or provide a link to it.
Open Datasets	Yes	evaluate their greedy block-wise learning performance for classiﬁcation in CIFAR-10 (Krizhevsky & Hinton, 2009), Tiny-Image Net1 and Image Net (Russakovsky et al., 2015). 1http://tiny-imagenet.herokuapp.com/.
Dataset Splits	Yes	We use 40% of CIFAR-10 training split as a validation set. 10% of train data is used as the validation set.
Hardware Specification	Yes	All experiments are conducted with total 8 NVIDIA Quadro 6000 GPU cards and 2 8-core Intel Xeon E5-2620 v4 processors with 256 GB RAM.
Software Dependencies	Yes	For implementation, we use Python 3.8 and Py Torch 1.6.0. At the search stage, we use the higher library1 to enable differentiable weight updates in Py Torch computational graphs. For evaluation, we implement asynchronous updates of blocks by introducing queues between blocks. For Pred Sim, DGL and Features Replay implementations, we refer to their ofﬁcial Py Torch implementations2. We use mixed precision training with Apex3 on Tiny-Image Net and Image Net.
Experiment Setup	Yes	We use Adam optimizer (Kingma & Ba, 2015) with a ﬁxed learning rate of 0.01 and a weight decay of 0.000001. For the inner optimization, we use SGD with a momentum of 0.9 and a weight decay of 0.001. We use an initial learning rate of 0.1 and decay it down to 0.001 with the cosine annealing learning rate decay (Loshchilov & Hutter, 2017). Label smoothing (Szegedy et al., 2016) of 0.1 is also used. We repeat bilevel optimization steps for 2K iterations. As mentioned in Section 3.3, we pretrain weights for 40K iterations with outer variables ﬁxed as zero and store 50 sets of weights with the best validation accuracies.