Leveraging Sparse Linear Layers for Debuggable Deep Networks
Authors: Eric Wong, Shibani Santurkar, Aleksander Madry
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that it is possible to construct deep networks that have sparse decision layers (e.g., with only 20-30 deep features per class for Image Net) without sacrificing much model performance. This involves developing a custom solver for fitting elastic net reg ularized linear models in order to perform effective sparsification at deep-learning scales.2 We show that sparsifying a network s decision layer can indeed help humans understand the resulting mod els better. For example, untrained annotators can intuit (simulate) the predictions of a model with a sparse decision layer with high ( 63%) accuracy. This is in contrast to their near chance performance ( 33%) for models with standard (dense) decision layers. We explore the use of sparse decision layers in three debugging tasks: diagnosing biases and spurious corre lations (Section 4.1), counterfactual generation (Sec tion 4.2) and identifying data patterns that cause misclassifications (Section 4.3). To enable this analysis, we design a suite of human-in-the-loop experiments. |
| Researcher Affiliation | Academia | 1 Massachusetts Institute of Technology, Cambridge, Massachusetts, USA. Correspondence to: Eric Wong <wongeric@mit.edu>, Shibani Santurkar <shibani@mit.edu>. |
| Pseudocode | No | The paper describes algorithms and methods but does not provide formal pseudocode blocks or algorithms. |
| Open Source Code | Yes | 1 The code for our toolkit can be found at https://github. com/madrylab/debuggabledeepnetworks. 2 A standalone package of our solver is available at https: //github.com/madrylab/glm_saga |
| Open Datasets | Yes | We perform our analysis on: (a) Res Net-50 classifiers (He et al., 2016) trained on Image Net-1k (Deng et al., 2009; Rus sakovsky et al., 2015) and Places-10 (a 10-class subset of Places365 (Zhou et al., 2017)); and (b) BERT (Devlin et al., 2018) for sentiment classification on Stanford Sentiment Treebank (SST) (Socher et al., 2013) and toxicity classifica tion of Wikipedia comments (Wulczyn et al., 2017). |
| Dataset Splits | Yes | For the rest of our study, we select a single sparse decision layer to balance performance and sparsity specifically the sparsest model whose accuracy is within 5% of top valida tion set performance (details in Appendix D.1.1). |
| Hardware Specification | No | The paper does not provide specific hardware specifications (e.g., GPU model, CPU type) used for experiments. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x). |
| Experiment Setup | Yes | For the rest of our study, we select a single sparse decision layer to balance performance and sparsity specifically the sparsest model whose accuracy is within 5% of top valida tion set performance (details in Appendix D.1.1). |