Lazy Estimation of Variable Importance for Large Neural Networks
Authors: Yue Gao, Abby Stevens, Garvesh Raskutti, Rebecca Willett
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate through simulations that our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example. |
| Researcher Affiliation | Academia | 1Department of Statistics, University of Wisconsin, Madison 2Department of Statistics, University of Chicago 3Department of Computer Science, University of Chicago. |
| Pseudocode | Yes | Algorithm 1 Lazy training for VI |
| Open Source Code | Yes | Our implementation is available at https://github.com/Willett-Group/lazyvi. |
| Open Datasets | Yes | We use simulations from the Community Earth System Model-Large Ensemble project (CESM-LENS; Kay et al. (2015); de La Beaujardi ere et al. (2019)). |
| Dataset Splits | Yes | We estimate hθf and hθ j using n1 < n samples as training data, and use the remaining n2 = n n1 samples to estimate VI. For the lazy training method, which we call Lazy VI, we use the training data to estimate the full model parameters, compute the gradient of the network with respect to each model parameter for each training sample, and then regress these gradients against the difference between Y the dropout estimates from the training data to estimate the parameter correction θj for variable j. We then update the full model parameters using this learned correction to compute the VI estimate and its associated standard errors. See Algorithm 1 for full details. Theorem 4.4 makes the assumption that the ridge parameter λ from Equation (15) is large. Since we are ultimately interested in estimating hθ j and not θj, we evaluate hθf + θj( ) through K-fold CV to choose ˆλj for each variable (Algorithm 2 in Appendix C.2). |
| Hardware Specification | No | The paper discusses training neural networks but does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory. |
| Software Dependencies | No | The paper mentions that its implementation is available on GitHub, implying specific software dependencies, but it does not explicitly list software names with their version numbers within the text. |
| Experiment Setup | Yes | For these experiments, we train a wide, fully connected two-layer neural network with Re LU activation for all simulations. Unless otherwise specified, the width of the hidden layer in the training network is m = 50. |