reproducibilityindex.ai

Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization

Authors: Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	demonstrate its performance on CIFAR-10 and Image Net.
Researcher Affiliation	Collaboration	1Plumerai Research {koen, james, lukas, roeland}@plumerai.com 2Hong Kong University of Science and Technology zliubq@connect.ust.hk, timcheng@ust.hk
Pseudocode	Yes	Algorithm 1: Training procedure for BNNs using latent weights. Algorithm 2: Bop, an optimizer for BNNs.
Open Source Code	Yes	Code is available at: https://github.com/plumerai/rethinking-bnn-optimization.
Open Datasets	Yes	CIFAR-10 and Image Net
Dataset Splits	No	The paper uses standard datasets like CIFAR-10 and ImageNet, which have predefined splits, but it does not explicitly state the dataset split percentages or sample counts used for training, validation, or testing within the text.
Hardware Specification	Yes	The experiments were conducted using Tensor Flow [27] and NVIDIA Tesla V100 GPUs.
Software Dependencies	No	The experiments were conducted using Tensor Flow, but no specific version number for Tensor Flow or any other software dependencies is provided.
Experiment Setup	Yes	To benchmark Bop we train for 500 epochs with threshold τ = 10 8, adaptivity rate γ = 10 4 decayed by 0.1 every 100 epochs, batch size 50, and use Adam with the recommended defaults for β1, β2, ϵ [14] and an initial learning rate of α = 10 2 to update the real-valued variables in the Batch Normalization layers [28]. We train Binary Net and Bi Real-Net for 150 epochs and XNOR-Net for 100 epochs. We use a batch size of 1024 and standard preprocessing with random ﬂip and resize but no further augmentation. For all three networks we use the same optimizer hyperparameters. We set the threshold to 1 10 8 and decay the adaptivity rate linearly from 1 10 4 to 1 10 6. For the real-valued variables, we use Adam with a linearly decaying learning rate from 2.5 10 3 to 5 10 6 and otherwise default settings (β1 = 0.9, β2 = 0.999 and ϵ = 1 10 7).