Elastic-Link for Binarized Neural Networks

Authors: Jie Hu, Ziheng Wu, Vince Tan, Zhilin Lu, Mengze Zeng, Enhua Wu942-950

AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To assess the effectiveness of EL, we conduct extensive experiments on the Image Net dataset. We outperform the current state-of-the-art result with a top-1 accuracy of 68.9%. We also contribute comprehensive ablation studies and discussions to further understanding of the intrinsic characteristics of BNNs.
Researcher Affiliation Collaboration 1 State Key Lab of Computer Science, ISCAS & University of Chinese Academy of Sciences 2 Department of Electronic Engineering, Tsinghua University 3 Alibaba Group 4 Byte Dance Inc. 5 University of Macau
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the methodology described. It only mentions using a third-party library, BMXNet.
Open Datasets Yes The dataset consists of about 1.28 million training images and 50 thousand validation images, annotated for 1000 classes. Image Net (Russakovsky et al. 2015).
Dataset Splits Yes The dataset consists of about 1.28 million training images and 50 thousand validation images, annotated for 1000 classes. During inference, we center-crop patches of size 224 224 from each image on the validation set and report the top-1 accuracy for comparison.
Hardware Specification Yes For a practical comparison, we use the BMXNet library (Yang et al. 2017) on an Intel Core i7-9700K CPU to measure the actual time taken.
Software Dependencies No The paper mentions using the 'BMXNet library (Yang et al. 2017)' but does not provide a specific version number for it or other software dependencies.
Experiment Setup Yes Input images are resized such that the shorter edge is 256 pixels, then randomly cropped to 224 224 pixels, followed by a random horizontal flip. Mean channel subtraction is used to normalize the input. All networks are trained from scratch using the Adam optimizer without weight decay normalization. The entire training process consists of 100 epochs with a mini-batch size of 256. The initial learning rate is set to 1e-3 and decreases after the 50th and 80th epochs by a factor of 10 each.