LOT: Layer-wise Orthogonal Training on Improving l2 Certified Robustness
Authors: Xiaojun Xu, Linyi Li, Bo Li
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct comprehensive evaluations for LOT under different settings. We show that LOT significantly outperforms baselines regarding deterministic ℓ2 certified robustness, and scales to deeper neural networks. We conduct comprehensive experiments to evaluate our approach |
| Researcher Affiliation | Academia | Xiaojun Xu Linyi Li Bo Li University of Illinois Urbana-Champaign {xiaojun3, linyi2, lbo}@illinois.edu |
| Pseudocode | Yes | The detailed algorithm is shown in Appendix B. |
| Open Source Code | Yes | 1The code is available at https://github.com/AI-secure/Layerwise-Orthogonal-Training. |
| Open Datasets | Yes | We focus on the CIFAR-10 and CIFAR-100 datasets and In semi-supervised learning, we use the 500K data introduced in [4] as the unlabelled dataset. |
| Dataset Splits | No | The paper mentions training on CIFAR-10 and CIFAR-100 datasets and evaluating on a 'testing set', but does not explicitly provide details about the training/validation/test dataset splits (e.g., percentages or sample counts for each split). |
| Hardware Specification | Yes | For the evaluation time comparison, we show the runtime taken to do a full pass on the testing set evaluated on an NVIDIA RTX A6000 GPU. |
| Software Dependencies | No | The paper describes its methods and training parameters but does not specify software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | To train the LOT network, we will train the model for 200 epochs using a momentum SGD optimizer with an initial learning rate 0.1 and decay by 0.1 at the 100-th and 150-th epochs. We use Newton s iteration with 10 steps which we observe is enough for convergence (see Appendix E.4). When CReg loss is applied, we use γ = 0.5; when HH activation is applied, we use the version of order 1. We add the residual connection with a fixed λ = 0.5 for LOT; for SOC, we use their original version, as we observe that residual connections even hurt their performance (see discussions in Section 6.3). |