TANGOS: Regularizing Tabular Neural Networks through Gradient Orthogonalization and Specialization

Authors: Alan Jeffares, Tennison Liu, Jonathan Crabbé, Fergus Imrie, Mihaela van der Schaar

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we empirically evaluate TANGOS as a regularization method for improving generalization performance. We present our benchmark methods and training architecture, followed by extensive results on real-world datasets.
Researcher Affiliation Academia Alan Jeffares University of Cambridge aj659@cam.ac.uk Tennison Liu University of Cambridge tl522@cam.ac.uk Jonathan Crabbé University of Cambridge jc2133@cam.ac.uk Fergus Imrie University of California, Los Angeles imrie@ucla.edu Mihaela van der Schaar University of Cambridge Alan Turing Institute mv472@cam.ac.uk
Pseudocode Yes Algorithm 1 TANGOS regularization
Open Source Code Yes Code is provided on Github12. (...) We have attempted to make our experimental results easily reproducible by both a detailed description of our experimental procedure and providing the code used to produce our results (https:// github.com/alanjeffares/TANGOS).
Open Datasets Yes We employ 20 real-world tabular datasets from the UCI machine learning repository. Each dataset is split into 80% for cross-validation and the remaining 20% for testing. (...) All datasets used in this work can be freely downloaded from the UCI repository (Dua et al., 2017) with specific details provided in Appendix L.
Dataset Splits Yes Each dataset is split into 80% for cross-validation and the remaining 20% for testing. The splits are standardized on just the training data, such that features have mean 0 and standard deviation 1 and categorical variables are one-hot encoded. (...) In all experiments, we use 5-fold cross-validation to train and validate each benchmark. We select the model which achieves the lowest validation error and provide a final evaluation on a held-out test set.
Hardware Specification Yes All experiments were run on NVIDIA RTX A4000 GPUs.
Software Dependencies No The paper mentions "Pytorch (Paszke et al., 2019)" but does not specify a version number for PyTorch or any other software dependencies.
Experiment Setup Yes For the specialization parameter we search for λ1 {1, 10, 100} and for the orthogonalization parameter we search for λ2 {0.1, 1}. (...) For the regularizers coefficients, we search for λ {0.1, 0.01, 0.001} where regularization is applied to all layers. Next, we consider Dropout (DO), with drop rate p {10%, 25%, 50%}, and apply DO after every dense layer during training. We also consider implicit regularization in batch normalization (BN). Lastly, we evaluate data augmentation techniques Input Noise (IN), where we use additive Gaussian noise with mean 0 and standard deviation σ {0.1, 0.05, 0.01} and Mix Up (MU). Furthermore, each training run applies early stopping with patience of 30 epochs. (...) all regularizers are applied to an MLP with two ReLU-activated hidden layers, where each hidden layer has d H + 1 neurons. The models are trained using Adam optimizer with a dataset-dependent learning rate from {0.01, 0.001, 0.0001} and are trained for up to a maximum of 200 epochs.