Consistent feature selection for analytic deep neural networks

Authors: Vu C. Dinh, Lam S. Ho

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The simulations, implemented in Pytorch, focus on single-output deep feed-forward networks with three hidden layers of constant width. In these experiments, regularizing constants are chosen from a course grid {0.001, 0.01, 0.05, 0.1, 0.5, 1, 2} with γ = 2 using average test errors from random train-test splits of the corresponding dataset. The algorithms are trained over 20000 epochs using proximal gradient descent, which allows us to identify the exact support of estimators without having to use a cut-off value for selection. In the first experiment, we consider a network with three hidden layers of 20 nodes. The input consists of 50 features, 10 of which are significant while the others are rendered insignificant by setting the corresponding weights to zero. We generate 100 datasets of size n = 5000 from the generic model Y = f (X) + where N(0, 1) and non-zero weights of are sampled independently from N(0, 1). We perform GL and GL+AGL on each simulated dataset with regularizing constants chosen using average test errors from three random three-fold train-test splits. We observe that overall, GL+AGL have a superior performance, selecting the correct support in 63 out of 100 runs, while GL cannot identify the support in any run. Except for one pathological case when both GL and GL+AGL choose a constant model, GL always selects the correct significant inputs but fail to de-select the insignificant ones (Figure 1, left panel) while GL+AGL always performs well with the insignificant inputs but sometimes over-shrinks the significant ones (Figure 1, right panel). Next, we apply the methods to the Boston housing dataset 3.
Researcher Affiliation Academia Vu Dinh Department of Mathematical Sciences University of Delaware Delaware, USA vucdinh@udel.edu Lam Si Tung Ho Department of Mathematics and Statistics Dalhousie University Halifax, Nova Scotia, Canada Lam.Ho@dal.ca
Pseudocode No The paper does not contain any pseudocode or algorithm blocks.
Open Source Code Yes The code is available at https://github.com/vucdinh/alg-net.
Open Datasets Yes Boston housing dataset 3. This dataset consists of 506 observations of house prices and 13 predictors. (Footnote 3: http://lib.stat.cmu.edu/datasets/boston)
Dataset Splits No The paper mentions 'average test errors from random train-test splits' and 'three random three-fold train-test splits' and for the Boston housing dataset, '20 random train-test splits (with the size of the test sets being 25% of the original dataset)'. While test set splits are specified, there is no explicit mention of a separate 'validation' split or how data was used for hyperparameter tuning beyond 'average test errors from random train-test splits'.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or cloud computing specifications.
Software Dependencies No The paper mentions that simulations were 'implemented in Pytorch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The simulations, implemented in Pytorch, focus on single-output deep feed-forward networks with three hidden layers of constant width. In these experiments, regularizing constants are chosen from a course grid {0.001, 0.01, 0.05, 0.1, 0.5, 1, 2} with γ = 2 using average test errors from random train-test splits of the corresponding dataset. The algorithms are trained over 20000 epochs using proximal gradient descent, which allows us to identify the exact support of estimators without having to use a cut-off value for selection.