Sparse tree-based Initialization for Neural Networks
Authors: Patrick Lutz, Ludovic Arnould, Claire Boyer, Erwan Scornet
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical experiments on several tabular data sets show the benefits of this new, simple and easy-to-use method, both in terms of generalization capacity and computation time, compared to default MLP initialization and even to existing complex deep learning solutions. |
| Researcher Affiliation | Academia | 1Boston University 2LPSM, Sorbonne University 3CMAP, Ecole Polytechnique |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code is publically available at https://github.com/LutzPatrick/SparseTreeBasedInit. |
| Open Datasets | Yes | Datasets & learning tasks We compare prediction performances on a total of 10 datasets: 3 regression datasets (Airbnb, Diamonds and Housing), 5 binary classification datasets (Adult, Bank, Blastchar, Heloc, Higgs) and 2 multi-class classification datasets (Covertype and Volkert). ... Tables 2 & 3 respectively give links to the platforms storing the data sets (four of them are available on the UCI Machine Learning Repository, Dua & Graff, 2017) and an overview of their main properties. |
| Dataset Splits | Yes | Performances are measured via the MSE for regression, the AUROC score (AUC) for binary classification and the accuracy (Acc.) for multi-class classification, averaging 5 runs of 5-fold cross-validation. ... The quantity minimized during HP tuning is the model s validation loss, and the smallest validation loss that occurred during training for MLP-based models. |
| Hardware Specification | Yes | All methods are trained on a 32 GB RAM machine using 12 Intel Core i7-8700K CPUs, and one NVIDIA Ge Force RTX 2080 GPU when possible (only the GDBT and MLP implementations including SAINT use the GPU). |
| Software Dependencies | No | The paper mentions several software packages and libraries like 'sklearn', 'Forest Layer library', 'XGBoost library', 'pytorch', 'Adam optimizer', and 'optuna library', but it does not specify exact version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | All NN are trained using the Adam optimizer (Kingma & Ba, 2014). All hyper-parameters (HP) are determined empirically using the optuna library (Akiba et al., 2019) for Bayesian optimization. ... An overview of all search spaces used for each method and the HP selected for experimental protocol P2 can be found in Appendix E.5. The quantity minimized during HP tuning is the model s validation loss, and the smallest validation loss that occurred during training for MLP-based models. |