Leveraging Sparse Linear Layers for Debuggable Deep Networks

Authors: Eric Wong, Shibani Santurkar, Aleksander Madry

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate that it is possible to construct deep networks that have sparse decision layers (e.g., with only 20-30 deep features per class for Image Net) without sacrificing much model performance. This involves developing a custom solver for fitting elastic net reg ularized linear models in order to perform effective sparsification at deep-learning scales.2 We show that sparsifying a network s decision layer can indeed help humans understand the resulting mod els better. For example, untrained annotators can intuit (simulate) the predictions of a model with a sparse decision layer with high ( 63%) accuracy. This is in contrast to their near chance performance ( 33%) for models with standard (dense) decision layers. We explore the use of sparse decision layers in three debugging tasks: diagnosing biases and spurious corre lations (Section 4.1), counterfactual generation (Sec tion 4.2) and identifying data patterns that cause misclassifications (Section 4.3). To enable this analysis, we design a suite of human-in-the-loop experiments.
Researcher Affiliation Academia 1 Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Correspondence to: Eric Wong <wongeric@mit.edu>, Shibani Santurkar <shibani@mit.edu>.
Pseudocode No The paper describes algorithms and methods but does not provide formal pseudocode blocks or algorithms.
Open Source Code Yes 1 The code for our toolkit can be found at https://github. com/madrylab/debuggabledeepnetworks. 2 A standalone package of our solver is available at https: //github.com/madrylab/glm_saga
Open Datasets Yes We perform our analysis on: (a) Res Net-50 classifiers (He et al., 2016) trained on Image Net-1k (Deng et al., 2009; Rus sakovsky et al., 2015) and Places-10 (a 10-class subset of Places365 (Zhou et al., 2017)); and (b) BERT (Devlin et al., 2018) for sentiment classification on Stanford Sentiment Treebank (SST) (Socher et al., 2013) and toxicity classifica tion of Wikipedia comments (Wulczyn et al., 2017).
Dataset Splits Yes For the rest of our study, we select a single sparse decision layer to balance performance and sparsity specifically the sparsest model whose accuracy is within 5% of top valida tion set performance (details in Appendix D.1.1).
Hardware Specification No The paper does not provide specific hardware specifications (e.g., GPU model, CPU type) used for experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x).
Experiment Setup Yes For the rest of our study, we select a single sparse decision layer to balance performance and sparsity specifically the sparsest model whose accuracy is within 5% of top valida tion set performance (details in Appendix D.1.1).