SEDONA: Search for Decoupled Neural Networks toward Greedy Block-wise Learning
Authors: Myeongjang Pyeon, Jihwan Moon, Taeyoung Hahn, Gunhee Kim
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our algorithm can consistently discover transferable decoupled architectures for VGG and Res Net variants, and significantly outperforms the ones trained with end-to-end backpropagation and other state-of-the-art greedy-leaning methods in CIFAR-10, Tiny-Image Net and Image Net. We experiment the proposed SEDONA in two stages of search and evaluation. In the search stage, SEDONA searches for the best decoupling configuration for a given neural network on CIFAR10 to minimize the validation loss. In the evaluation stage, we split the networks according to the searched configuration, and evaluate their greedy block-wise learning performance for classification in CIFAR-10 (Krizhevsky & Hinton, 2009), Tiny-Image Net1 and Image Net (Russakovsky et al., 2015). |
| Researcher Affiliation | Academia | Myeongjang Pyeon, Jihwan Moon, Taeyoung Hahn, and Gunhee Kim Seoul National University, Seoul, Korea |
| Pseudocode | Yes | Algorithm 1: SEDONA Searching for Decoupled Neural Architectures |
| Open Source Code | No | For Pred Sim, DGL and Features Replay implementations, we refer to their official Py Torch implementations. Pred Sim: https://github.com/anokland/local-loss, DGL: https://github.com/ eugenium/DGL, Features Replay: https://github.com/slowbull/Features Replay. The paper does not state that the code for SEDONA itself is open source or provide a link to it. |
| Open Datasets | Yes | evaluate their greedy block-wise learning performance for classification in CIFAR-10 (Krizhevsky & Hinton, 2009), Tiny-Image Net1 and Image Net (Russakovsky et al., 2015). 1http://tiny-imagenet.herokuapp.com/. |
| Dataset Splits | Yes | We use 40% of CIFAR-10 training split as a validation set. 10% of train data is used as the validation set. |
| Hardware Specification | Yes | All experiments are conducted with total 8 NVIDIA Quadro 6000 GPU cards and 2 8-core Intel Xeon E5-2620 v4 processors with 256 GB RAM. |
| Software Dependencies | Yes | For implementation, we use Python 3.8 and Py Torch 1.6.0. At the search stage, we use the higher library1 to enable differentiable weight updates in Py Torch computational graphs. For evaluation, we implement asynchronous updates of blocks by introducing queues between blocks. For Pred Sim, DGL and Features Replay implementations, we refer to their official Py Torch implementations2. We use mixed precision training with Apex3 on Tiny-Image Net and Image Net. |
| Experiment Setup | Yes | We use Adam optimizer (Kingma & Ba, 2015) with a fixed learning rate of 0.01 and a weight decay of 0.000001. For the inner optimization, we use SGD with a momentum of 0.9 and a weight decay of 0.001. We use an initial learning rate of 0.1 and decay it down to 0.001 with the cosine annealing learning rate decay (Loshchilov & Hutter, 2017). Label smoothing (Szegedy et al., 2016) of 0.1 is also used. We repeat bilevel optimization steps for 2K iterations. As mentioned in Section 3.3, we pretrain weights for 40K iterations with outer variables fixed as zero and store 50 sets of weights with the best validation accuracies. |