Latent Weights Do Not Exist: Rethinking Binarized Neural Network Optimization
Authors: Koen Helwegen, James Widdicombe, Lukas Geiger, Zechun Liu, Kwang-Ting Cheng, Roeland Nusselder
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | demonstrate its performance on CIFAR-10 and Image Net. |
| Researcher Affiliation | Collaboration | 1Plumerai Research {koen, james, lukas, roeland}@plumerai.com 2Hong Kong University of Science and Technology zliubq@connect.ust.hk, timcheng@ust.hk |
| Pseudocode | Yes | Algorithm 1: Training procedure for BNNs using latent weights. Algorithm 2: Bop, an optimizer for BNNs. |
| Open Source Code | Yes | Code is available at: https://github.com/plumerai/rethinking-bnn-optimization. |
| Open Datasets | Yes | CIFAR-10 and Image Net |
| Dataset Splits | No | The paper uses standard datasets like CIFAR-10 and ImageNet, which have predefined splits, but it does not explicitly state the dataset split percentages or sample counts used for training, validation, or testing within the text. |
| Hardware Specification | Yes | The experiments were conducted using Tensor Flow [27] and NVIDIA Tesla V100 GPUs. |
| Software Dependencies | No | The experiments were conducted using Tensor Flow, but no specific version number for Tensor Flow or any other software dependencies is provided. |
| Experiment Setup | Yes | To benchmark Bop we train for 500 epochs with threshold τ = 10 8, adaptivity rate γ = 10 4 decayed by 0.1 every 100 epochs, batch size 50, and use Adam with the recommended defaults for β1, β2, ϵ [14] and an initial learning rate of α = 10 2 to update the real-valued variables in the Batch Normalization layers [28]. We train Binary Net and Bi Real-Net for 150 epochs and XNOR-Net for 100 epochs. We use a batch size of 1024 and standard preprocessing with random flip and resize but no further augmentation. For all three networks we use the same optimizer hyperparameters. We set the threshold to 1 10 8 and decay the adaptivity rate linearly from 1 10 4 to 1 10 6. For the real-valued variables, we use Adam with a linearly decaying learning rate from 2.5 10 3 to 5 10 6 and otherwise default settings (β1 = 0.9, β2 = 0.999 and ϵ = 1 10 7). |