reproducibilityindex.ai

High-Capacity Expert Binary Networks

Authors: Adrian Bulat, Brais Martinez, Georgios Tzimiropoulos

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Overall, our method improves upon prior work, with no increase in computational cost, by 6%, reaching a groundbreaking 71% on Image Net classiﬁcation. Fig. 1b conﬁrms this experimentally by t-SNE embedding visualisation of the features before the classiﬁer along with the corresponding expert that was activated for each sample of the Image Net validation set.
Researcher Affiliation	Collaboration	Adrian Bulat Samsung AI Cambridge adrian@adrianbulat.com Brais Martinez Samsung AI Cambridge brais.a@samsung.com Georgios Tzimiropoulos Samsung AI Cambridge Queen Mary University of London, UK g.tzimiropoulos@qmul.ac.uk
Pseudocode	No	Overall, our optimization policy can be summarized as follows: 1. Train one expert, parametrized by θ0, using real weights and binary activations. 2. Replicate θ0 to all θi, i = {1, N 1} to initialize matrix Θ. 3. Train the model initialized in step 2 using real weights and binary activations. 4. Train the model obtained from step 3 using binary weights and activations. This is a descriptive list of steps, not pseudocode or an algorithm block.
Open Source Code	No	Code will be made available here.
Open Datasets	Yes	We compared our method against the current state-of-the-art in binary networks on the Image Net dataset (Deng et al., 2009). Additional comparisons, including on CIFAR-100 (Krizhevsky et al., 2009), can be found in the supplementary material in Section A.2.
Dataset Splits	Yes	Fig. 1b conﬁrms this experimentally by t-SNE embedding visualisation of the features before the classiﬁer along with the corresponding expert that was activated for each sample of the Image Net validation set. The images are augmented following the common strategy used in prior-work (He et al., 2016) by randomly scaling and cropping the images to a resolution of 224 224px.
Hardware Specification	Yes	All models were trained on 4 V100 GPUs and implemented using Py Torch (Paszke et al., 2019).
Software Dependencies	No	All models were trained on 4 V100 GPUs and implemented using Py Torch (Paszke et al., 2019). The mention of 'Py Torch' lacks a specific version number.
Experiment Setup	Yes	The training procedure largely follows that of Martinez et al. (2020). In particular, we trained our networks using Adam optimizer (Kingma & Ba, 2014) for 75 epochs using a learning rate of 10 3 that is decreased by 10 at epoch 40, 55 and 65. During Stage I, we set the weight decay to 10 5 and to 0 during Stage II. Furthermore, following Martinez et al. (2020), during the ﬁrst 10 epochs, we apply a learning rate warm-up Goyal et al. (2017).