Robust Binary Models by Pruning Randomly-initialized Networks

Authors: Chen Liu, Ziqi Zhao, Sabine Süsstrunk, Mathieu Salzmann

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments on standard benchmarks to confirm the effectiveness of our method. Our experiments demonstrate that our approach not only always outperforms the state-of-the-art robust binary networks, but also can achieve accuracy better than full-precision ones on some datasets.
Researcher Affiliation Academia École Polytechnique Fédérale de Lausanne, Lausanne, Switzerland, {chen.liu, ziqi.zhao, sabine.susstrunk, mathieu.salzmann}@epfl.ch
Pseudocode Yes We provide the pseudo-code of our algorithm in Appendix C.
Open Source Code Yes Our code is available at https://github.com/IVRL/Robust Binary Sub Net.
Open Datasets Yes We use the CIFAR10 dataset [36] in the ablation study; we also use the CIFAR100 dataset [36] and the Image Net100 dataset [16, 20] in the comparisons with the baselines.
Dataset Splits No The paper mentions training and testing on CIFAR10, CIFAR100, and ImageNet100 datasets, but does not explicitly provide the specific training, validation, and test dataset splits (e.g., percentages or sample counts) used, nor does it cite a source for these splits within the paper or supplementary material.
Hardware Specification Yes All the experiments were conducted on NVIDIA A100 GPUs.
Software Dependencies No The paper mentions using PyTorch for dataset download and SGD optimizer, but it does not specify version numbers for any software dependencies like PyTorch, Python, or other libraries.
Experiment Setup Yes We train the models for 400 epochs on CIFAR10/100 and 100 epochs on Image Net100. We use a cosine annealing learning rate scheduler with an initial value of 0.1. We use an l1 norm-based adversarial budget, and the perturbation strength is 8/255 for CIFAR10, 4/255 for CIFAR100 and 2/255 for Image Net100. For CIFAR10/100, we used a batch size of 128... We use the SGD optimizer [35] with a momentum of 0.9 and a weight decay of 5e-4.