Sparse tree-based Initialization for Neural Networks

Authors: Patrick Lutz, Ludovic Arnould, Claire Boyer, Erwan Scornet

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical experiments on several tabular data sets show the benefits of this new, simple and easy-to-use method, both in terms of generalization capacity and computation time, compared to default MLP initialization and even to existing complex deep learning solutions.
Researcher Affiliation Academia 1Boston University 2LPSM, Sorbonne University 3CMAP, Ecole Polytechnique
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code is publically available at https://github.com/LutzPatrick/SparseTreeBasedInit.
Open Datasets Yes Datasets & learning tasks We compare prediction performances on a total of 10 datasets: 3 regression datasets (Airbnb, Diamonds and Housing), 5 binary classification datasets (Adult, Bank, Blastchar, Heloc, Higgs) and 2 multi-class classification datasets (Covertype and Volkert). ... Tables 2 & 3 respectively give links to the platforms storing the data sets (four of them are available on the UCI Machine Learning Repository, Dua & Graff, 2017) and an overview of their main properties.
Dataset Splits Yes Performances are measured via the MSE for regression, the AUROC score (AUC) for binary classification and the accuracy (Acc.) for multi-class classification, averaging 5 runs of 5-fold cross-validation. ... The quantity minimized during HP tuning is the model s validation loss, and the smallest validation loss that occurred during training for MLP-based models.
Hardware Specification Yes All methods are trained on a 32 GB RAM machine using 12 Intel Core i7-8700K CPUs, and one NVIDIA Ge Force RTX 2080 GPU when possible (only the GDBT and MLP implementations including SAINT use the GPU).
Software Dependencies No The paper mentions several software packages and libraries like 'sklearn', 'Forest Layer library', 'XGBoost library', 'pytorch', 'Adam optimizer', and 'optuna library', but it does not specify exact version numbers for these dependencies, which is required for reproducibility.
Experiment Setup Yes All NN are trained using the Adam optimizer (Kingma & Ba, 2014). All hyper-parameters (HP) are determined empirically using the optuna library (Akiba et al., 2019) for Bayesian optimization. ... An overview of all search spaces used for each method and the HP selected for experimental protocol P2 can be found in Appendix E.5. The quantity minimized during HP tuning is the model s validation loss, and the smallest validation loss that occurred during training for MLP-based models.