Training Adversarially Robust Sparse Networks via Bayesian Connectivity Sampling
Authors: Ozan Özdenizci, Robert Legenstein
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We compare our approach with sparse learning baselines in Section 4.1, and state-of-the-art robustness-aware pruning methods in Section 4.2. Dataset and model specifications, as well as training and evaluation details are described below. Datasets & Model Architectures: We perform experiments with three benchmark datasets: CIFAR-10 and CIFAR-100 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011) (see Appendix A.1 for further details). |
| Researcher Affiliation | Collaboration | 1Graz University of Technology, Institute of Theoretical Computer Science, Graz, Austria 2Silicon Austria Labs, TU Graz SAL Dependable Embedded Systems Lab, Graz, Austria. |
| Pseudocode | Yes | Algorithm 1 Robust end-to-end sparse training |
| Open Source Code | Yes | Our code is available at: https://github.com/IGITUGraz/SparseAdversarialTraining. |
| Open Datasets | Yes | We perform experiments with three benchmark datasets: CIFAR-10 and CIFAR-100 (Krizhevsky, 2009), and SVHN (Netzer et al., 2011) (see Appendix A.1 for further details). |
| Dataset Splits | No | The paper mentions training on datasets and evaluating on test sets but does not specify a dedicated validation dataset split with percentages or counts. |
| Hardware Specification | No | No specific hardware details (e.g., CPU/GPU models, memory, or cloud instances) are provided for the experimental setup. |
| Software Dependencies | No | The paper mentions some tools and frameworks used (e.g., Foolbox Native benchmark, SGD with momentum, decoupled weight decay) but does not specify version numbers for any software dependencies. |
| Experiment Setup | Yes | All models were trained for 200 epochs with a batch size of 128. Only for models trained with RST the batch size was set to 256, while keeping the total number of iterations the same. We used piecewise constant decay learning rate and weight decay schedulers. Initial learning rates were set to 0.1 and were divided by 10 at 100th and 150th epochs. Network weights were initialized via Kaiming initialization (He et al., 2015). |