Lazy Estimation of Variable Importance for Large Neural Networks

Authors: Yue Gao, Abby Stevens, Garvesh Raskutti, Rebecca Willett

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate through simulations that our method is fast and accurate under several data-generating regimes, and we demonstrate its real-world applicability on a seasonal climate forecasting example.
Researcher Affiliation Academia 1Department of Statistics, University of Wisconsin, Madison 2Department of Statistics, University of Chicago 3Department of Computer Science, University of Chicago.
Pseudocode Yes Algorithm 1 Lazy training for VI
Open Source Code Yes Our implementation is available at https://github.com/Willett-Group/lazyvi.
Open Datasets Yes We use simulations from the Community Earth System Model-Large Ensemble project (CESM-LENS; Kay et al. (2015); de La Beaujardi ere et al. (2019)).
Dataset Splits Yes We estimate hθf and hθ j using n1 < n samples as training data, and use the remaining n2 = n n1 samples to estimate VI. For the lazy training method, which we call Lazy VI, we use the training data to estimate the full model parameters, compute the gradient of the network with respect to each model parameter for each training sample, and then regress these gradients against the difference between Y the dropout estimates from the training data to estimate the parameter correction θj for variable j. We then update the full model parameters using this learned correction to compute the VI estimate and its associated standard errors. See Algorithm 1 for full details. Theorem 4.4 makes the assumption that the ridge parameter λ from Equation (15) is large. Since we are ultimately interested in estimating hθ j and not θj, we evaluate hθf + θj( ) through K-fold CV to choose ˆλj for each variable (Algorithm 2 in Appendix C.2).
Hardware Specification No The paper discusses training neural networks but does not provide specific details about the hardware used, such as GPU models, CPU specifications, or memory.
Software Dependencies No The paper mentions that its implementation is available on GitHub, implying specific software dependencies, but it does not explicitly list software names with their version numbers within the text.
Experiment Setup Yes For these experiments, we train a wide, fully connected two-layer neural network with Re LU activation for all simulations. Unless otherwise specified, the width of the hidden layer in the training network is m = 50.