reproducibilityindex.ai

Consistent feature selection for analytic deep neural networks

Authors: Vu C. Dinh, Lam S. Ho

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The simulations, implemented in Pytorch, focus on single-output deep feed-forward networks with three hidden layers of constant width. In these experiments, regularizing constants are chosen from a course grid {0.001, 0.01, 0.05, 0.1, 0.5, 1, 2} with γ = 2 using average test errors from random train-test splits of the corresponding dataset. The algorithms are trained over 20000 epochs using proximal gradient descent, which allows us to identify the exact support of estimators without having to use a cut-off value for selection. In the ﬁrst experiment, we consider a network with three hidden layers of 20 nodes. The input consists of 50 features, 10 of which are signiﬁcant while the others are rendered insigniﬁcant by setting the corresponding weights to zero. We generate 100 datasets of size n = 5000 from the generic model Y = f (X) + where N(0, 1) and non-zero weights of are sampled independently from N(0, 1). We perform GL and GL+AGL on each simulated dataset with regularizing constants chosen using average test errors from three random three-fold train-test splits. We observe that overall, GL+AGL have a superior performance, selecting the correct support in 63 out of 100 runs, while GL cannot identify the support in any run. Except for one pathological case when both GL and GL+AGL choose a constant model, GL always selects the correct signiﬁcant inputs but fail to de-select the insigniﬁcant ones (Figure 1, left panel) while GL+AGL always performs well with the insigniﬁcant inputs but sometimes over-shrinks the signiﬁcant ones (Figure 1, right panel). Next, we apply the methods to the Boston housing dataset 3.
Researcher Affiliation	Academia	Vu Dinh Department of Mathematical Sciences University of Delaware Delaware, USA vucdinh@udel.edu Lam Si Tung Ho Department of Mathematics and Statistics Dalhousie University Halifax, Nova Scotia, Canada Lam.Ho@dal.ca
Pseudocode	No	The paper does not contain any pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github.com/vucdinh/alg-net.
Open Datasets	Yes	Boston housing dataset 3. This dataset consists of 506 observations of house prices and 13 predictors. (Footnote 3: http://lib.stat.cmu.edu/datasets/boston)
Dataset Splits	No	The paper mentions 'average test errors from random train-test splits' and 'three random three-fold train-test splits' and for the Boston housing dataset, '20 random train-test splits (with the size of the test sets being 25% of the original dataset)'. While test set splits are specified, there is no explicit mention of a separate 'validation' split or how data was used for hyperparameter tuning beyond 'average test errors from random train-test splits'.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as CPU or GPU models, or cloud computing specifications.
Software Dependencies	No	The paper mentions that simulations were 'implemented in Pytorch' but does not provide specific version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	The simulations, implemented in Pytorch, focus on single-output deep feed-forward networks with three hidden layers of constant width. In these experiments, regularizing constants are chosen from a course grid {0.001, 0.01, 0.05, 0.1, 0.5, 1, 2} with γ = 2 using average test errors from random train-test splits of the corresponding dataset. The algorithms are trained over 20000 epochs using proximal gradient descent, which allows us to identify the exact support of estimators without having to use a cut-off value for selection.