Bort: Towards Explainable Neural Networks with Bounded Orthogonal Constraint

Authors: Borui Zhang, Wenzhao Zheng, Jie Zhou, Jiwen Lu

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform reconstruction and backtracking on the model representations optimized by Bort and observe a clear improvement in model explainability. Based on Bort, we are able to synthesize explainable adversarial samples without additional parameters and training. Surprisingly, we find Bort constantly improves the classification accuracy of various architectures including ResNet and DeiT on MNIST, CIFAR-10, and ImageNet. Code: https://github.com/zbr17/Bort.
Researcher Affiliation Academia Borui Zhang , Wenzhao Zheng , Jie Zhou , Jiwen Lu Department of Automation, Tsinghua University, China Beijing National Research Center for Information Science and Technology, China {zhang-br21, zhengwz18}@mails.tsinghua.edu.cn; {jzhou, lujiwen}@tsinghua.edu.cn
Pseudocode Yes Algorithm 1: The SAT algorithm. Input: The top feature map Z, the backtracking mapping g, number k, constant B, and threshold γ. Output: Saliency map A.
Open Source Code Yes Code: https://github.com/zbr17/Bort.
Open Datasets Yes We conduct classification experiments on MNIST, CIFAR-10, and Image Net... To begin with, we test Bort on MNIST (Deng, 2012) and CIFAR-10 (Krizhevsky et al., 2009). We evaluate Bort on the large-scale Image Net (Deng et al., 2009)...
Dataset Splits No The paper references standard datasets (MNIST, CIFAR-10, Image Net) but does not explicitly provide specific train/validation/test dataset splits by percentage, absolute sample counts, or direct citations to predefined splits within the text.
Hardware Specification Yes All experiments are conducted on one NVIDIA 3090 card. All experiments are conducted on 8 A100 cards.
Software Dependencies No The paper mentions software like 'Pytorch image models' but does not provide specific version numbers for any software components, libraries, or solvers used in the experiments.
Experiment Setup Yes We set the learning rate to 0.01 without any learning rate adjustment schedule and train each model for 40 epochs with batch size fixed to 256. No data augmentation strategy is utilized. The constraint coefficient is set to 0.1, and the weight decay is set to 0.01. For training CNN-type models (i.e., VGG16 and ResNet50), we follow the recipe in public codes (Wightman, 2019). We set the learning rate to 0.05 for SGD, 0.001 for AdamW, and 0.005 for LAMB. We utilize 3-split data augmentation including RandAugment (Cubuk et al., 2020) and Random Erasing. We train the model for 300 epochs with the batch size set to 1024 for SGD and AdamW and 2048 for LAMB. For LAMB, weight decay is 0.002 and λ coefficient to 0.00002; For SGD and AdamW, we set weight decay to 0.00002 and λ coefficient to 0.0001.