TabMT: Generating tabular data with masked transformers

Authors: Manbir Gulati, Paul Roysdon

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this section, we present a comprehensive evaluation of Tab MT's effectiveness across an extensive range of tabular datasets. Our analysis involves a thorough comparison with state-of-the-art approaches, encompassing nearly all generative model families. To ensure a robust assessment, we evaluate across several dimensions and metrics.
Researcher Affiliation Industry Manbir S. Gulati AI Accelerator Leidos Inc Manbir.S.Gulati@leidos.com Paul F. Roysdon AI Accelerator Leidos Inc Paul.Roysdon@leidos.com
Pseudocode Yes Detailed pseudocode is available in the Appendix.
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the described methodology.
Open Datasets Yes For our data quality and privacy experiments we use the same list of datasets and data splits as Tab DDPM[12]. These 15 datasets range in size from 400 samples to 150, 000 samples. They contain continuous, categorical, and integer features. The datasets range from 6 to 50 columns. For our scaling experiments we use the CIDDS-001[20] dataset, which consists of Netflow traffic from a simulated small business network.
Dataset Splits Yes For our data quality and privacy experiments we use the same list of datasets and data splits as Tab DDPM[12]. [...] We do not use anomalous traffic from the dataset, and randomly select 5% of the dataset as the validation set for reporting results.
Hardware Specification Yes Each data quality experiment was conducted using a single A10 GPU each. [...] Each model was trained on a single A10 GPU with the exception of Tab MT-L which was trained using 4 V100s.
Software Dependencies No The paper mentions using the 'Adam W[14] optimizer' and 'Cat Boost[6]' models, but does not provide specific version numbers for any software libraries, frameworks, or programming languages used in the implementation.
Experiment Setup Yes We use the Adam W[14] optimizer with a learning rate of 0.002 and weight decay of 0.01, a batch size of 2048 and a cosine annealing learning rate schedule for 350,000 training steps and 10000 warm-up steps. [...] Table 4: Model topologies used in scaling experiments. The large model sizes here demonstrate we can scale well in terms of model size and dataset size. Model Width Depth Heads Tab MT-S 64 12 4 Tab MT-M 384 12 8 Tab MT-L 576 24 12