Learning Enhanced Representation for Tabular Data via Neighborhood Propagation

Authors: Kounianhua Du, Weinan Zhang, Ruiwen Zhou, Yangkun Wang, Xilong Zhao, Jiarui Jin, Quan Gan, Zheng Zhang, David P Wipf

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on two important tabular data prediction tasks validate the superiority of the proposed PET model relative to other baselines. Additionally, we demonstrate the effectiveness of the model components and the feature enhancement ability of PET via various ablation studies and visualizations.
Researcher Affiliation Collaboration Kounianhua Du , Weinan Zhang , Ruiwen Zhou , Yangkun Wang , Xilong Zhao , Jiarui Jin Department of Computer Science Shanghai Jiao Tong University {774581965, wnzhang, skyriver, espylacopa, zhaoxilong, jinjiarui97}@sjtu.edu.cn Quan Gan, Zheng Zhang, David Wipf Amazon {quagan, zhaz, daviwipf}@amazon.com Work done during internship at Amazon Web Services Shanghai AI Lab.
Pseudocode No The paper describes its methods using mathematical equations and descriptive text but does not include explicit pseudocode blocks or algorithm figures.
Open Source Code Yes The code is available at https://github.com/Kounianhua Du/PET.
Open Datasets Yes For the CTR prediction task, we conduct experiments on three large-scale datasets, i.e., Tmall3, Taobao4, and Alipay5. For the top-n recommendation task, we experiment on two widely-used public recommendation datasets, i.e., Movielens-1M6 and Last FM7. (Footnotes provide URLs: 3https://tianchi.aliyun.com/dataset/data Detail?data Id=42 4https://tianchi.aliyun.com/dataset/data Detail?data Id=649 5https://tianchi.aliyun.com/dataset/data Detail?data Id=53 6https://grouplens.org/datasets/movielens/1m/ 7http://ocelma.net/Music Recommendation Dataset/lastfm-1K.html)
Dataset Splits No The paper describes a temporal split for train/test pools ("The earliest data instances are grouped into the retrieval pool. The latest data instances form the test pool. Then the remaining data instances are grouped into the train pool."), but it does not explicitly detail a separate validation split with percentages or counts for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions using Elastic Search but does not provide specific version numbers for any software dependencies, frameworks, or libraries used in their experiments.
Experiment Setup No The paper states: "As for the hyperparameters, we test the number of GNN layers in {2, 3}. The embedding sizes of all the models are consistent to ensure the fair comparison. More detailed hyperparameters and experiment settings are provided in Appendix A.5." It defers the detailed settings to an appendix, rather than providing them in the main text.