How Do Adam and Training Strategies Help BNNs Optimization
Authors: Zechun Liu, Zhiqiang Shen, Shichao Li, Koen Helwegen, Dong Huang, Kwang-Ting Cheng
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments and analysis, we derive a simple training scheme, building on existing Adam-based optimization, which achieves 70.5% top-1 accuracy on the Image Net dataset using the same architecture as the state-of-the-art Re Act Net (Liu et al., 2020) while achieving 1.1% higher accuracy. |
| Researcher Affiliation | Collaboration | 1Hong Kong University of Science and Technology 2Carnegie Mellon University 3Plumerai. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code and models are available at https: //github.com/liuzechun/Adam BNN. |
| Open Datasets | Yes | All the analytical experiments are conducted on the Image Net 2012 classification dataset (Russakovsky et al., 2015). |
| Dataset Splits | No | The paper discusses 'validation accuracy' and 'training accuracy' but does not explicitly provide the specific percentages or sample counts for its training, validation, or test dataset splits needed for reproduction. While ImageNet has standard splits, the paper does not state that these were specifically used or detail its own split methodology. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU models, CPU types, memory) used to run the experiments. It only mentions the use of PyTorch for setting learning rates. |
| Software Dependencies | No | The paper mentions using PyTorch for setting initial learning rates: 'In this experiment, the initial learning rates for different optimizers are set to the Py Torch (Paszke et al., 2019) default values'. However, it does not provide a specific version number for PyTorch or any other software dependency. |
| Experiment Setup | Yes | We train the network for 600K iterations with batch size set to 512. The initial learning rate is set to 0.1 for SGD and 0.0025 for Adam, with linear learning rate decay. |