How Do Adam and Training Strategies Help BNNs Optimization

Authors: Zechun Liu, Zhiqiang Shen, Shichao Li, Koen Helwegen, Dong Huang, Kwang-Ting Cheng

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments and analysis, we derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the Image Net dataset using the same architecture as the state-of-the-art Re Act Net (Liu et al., 2020) while achieving 1.1% higher accuracy.
Researcher Affiliation Collaboration 1Hong Kong University of Science and Technology 2Carnegie Mellon University 3Plumerai.
Pseudocode No The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes Code and models are available at https: //github.com/liuzechun/Adam BNN.
Open Datasets Yes All the analytical experiments are conducted on the Image Net 2012 classification dataset (Russakovsky et al., 2015).
Dataset Splits No The paper discusses 'validation accuracy' and 'training accuracy' but does not explicitly provide the specific percentages or sample counts for its training, validation, or test dataset splits needed for reproduction. While ImageNet has standard splits, the paper does not state that these were specifically used or detail its own split methodology.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions the use of PyTorch for setting learning rates.
Software Dependencies No The paper mentions using PyTorch for setting initial learning rates: 'In this experiment, the initial learning rates for different optimizers are set to the Py Torch (Paszke et al., 2019) default values'. However, it does not provide a specific version number for PyTorch or any other software dependency.
Experiment Setup Yes We train the network for 600K iterations with batch size set to 512. The initial learning rate is set to 0.1 for SGD and 0.0025 for Adam, with linear learning rate decay.