Arithmetic Feature Interaction Is Necessary for Deep Tabular Learning
Authors: Yi Cheng, Renjun Hu, Haochao Ying, Xing Shi, Jian Wu, Wei Lin
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To test this point, we create a synthetic tabular dataset with a mild feature interaction assumption and examine a modified transformer architecture enabling arithmetical feature interactions, referred to as AMFormer. Results show that AMFormer outperforms strong counterparts in fine-grained tabular data modeling, data efficiency in training, and generalization. Our extensive experiments on real-world data also validate the consistent effectiveness, efficiency, and rationale of AMFormer |
| Researcher Affiliation | Collaboration | Yi Cheng123*, Renjun Hu4 , Haochao Ying135 , Xing Shi4, Jian Wu135, Wei Lin4 1State Key Laboratory of Transvascular Implantation Devices of the Second Affiliated Hospital, Zhejiang University School of Medicine, China 2School of Software Technology, Zhejiang University, China 3Institute of Wenzhou, Zhejiang University, China 4Alibaba Group 5School of Public Health, Zhejiang University, China {chengy1, haochaoying, wujian2000}@zju.edu.cn, {renjun.hrj, shubao.sx, weilin.lw}@alibaba-inc.com |
| Pseudocode | No | The paper includes equations and a detailed architectural diagram (Figure 2) to describe the methodology, but it does not contain a block explicitly labeled "Pseudocode" or "Algorithm". |
| Open Source Code | Yes | Code is available at https://github. com/aigc-apps/AMFormer. |
| Open Datasets | Yes | EP (Epsilon) is a simulated physics dataset from (PASCAL 2008)... HC (Home Credit Default Risk) uses both numerical and categorical features to predict clients binary repayment abilities, with a ratio of around 1:10 for positive to negative samples (Anna Montoya and Kotek 2018). CO (Covtype) is a multi-classification dataset that utilizes surrounding characteristics to predict types of trees growing in an area (Rossi and Ahmed 2015). MI (MSLR-WEB10K) is a learn-to-rank dataset for query-URL relevance ranking (Qin and Liu 2013). |
| Dataset Splits | Yes | Table 1: Dataset statistics and evaluation settings. EP binary classification Acc 320,000 80,000 100,000 2,000 / |
| Hardware Specification | Yes | All tests are conducted on a machine with 104 Intel(R) Xeon(R) Platinum 8269CY CPUs and an NVIDIA Tesla A100-SXM-40GB. |
| Software Dependencies | Yes | All tested models are implemented with Py Torch v1.12 (Paszke et al. 2019). |
| Experiment Setup | Yes | We adopt Adam with betas=(0.9, 0.999) and eps=1e-8 for optimization. The learning rate first linearly increases to 1e-3 in the first 1k steps and then decays by 90% every 20k steps by default, except for the HC data with an initial 1e-4 and 4k decaying steps. The default batch size is 512, which reduces to 32 for transformer-based methods on the EP data due to GPUmemory limitation. We report the detailed hyper-parameters of all methods in the supplement. |