Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning Enhanced Representation for Tabular Data via Neighborhood Propagation
Authors: Kounianhua Du, Weinan Zhang, Ruiwen Zhou, Yangkun Wang, Xilong Zhao, Jiarui Jin, Quan Gan, Zheng Zhang, David P Wipf
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments on two important tabular data prediction tasks validate the superiority of the proposed PET model relative to other baselines. Additionally, we demonstrate the effectiveness of the model components and the feature enhancement ability of PET via various ablation studies and visualizations. |
| Researcher Affiliation | Collaboration | Kounianhua Du , Weinan Zhang , Ruiwen Zhou , Yangkun Wang , Xilong Zhao , Jiarui Jin Department of Computer Science Shanghai Jiao Tong University EMAIL Quan Gan, Zheng Zhang, David Wipf Amazon EMAIL Work done during internship at Amazon Web Services Shanghai AI Lab. |
| Pseudocode | No | The paper describes its methods using mathematical equations and descriptive text but does not include explicit pseudocode blocks or algorithm figures. |
| Open Source Code | Yes | The code is available at https://github.com/Kounianhua Du/PET. |
| Open Datasets | Yes | For the CTR prediction task, we conduct experiments on three large-scale datasets, i.e., Tmall3, Taobao4, and Alipay5. For the top-n recommendation task, we experiment on two widely-used public recommendation datasets, i.e., Movielens-1M6 and Last FM7. (Footnotes provide URLs: 3https://tianchi.aliyun.com/dataset/data Detail?data Id=42 4https://tianchi.aliyun.com/dataset/data Detail?data Id=649 5https://tianchi.aliyun.com/dataset/data Detail?data Id=53 6https://grouplens.org/datasets/movielens/1m/ 7http://ocelma.net/Music Recommendation Dataset/lastfm-1K.html) |
| Dataset Splits | No | The paper describes a temporal split for train/test pools ("The earliest data instances are grouped into the retrieval pool. The latest data instances form the test pool. Then the remaining data instances are grouped into the train pool."), but it does not explicitly detail a separate validation split with percentages or counts for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions using Elastic Search but does not provide specific version numbers for any software dependencies, frameworks, or libraries used in their experiments. |
| Experiment Setup | No | The paper states: "As for the hyperparameters, we test the number of GNN layers in {2, 3}. The embedding sizes of all the models are consistent to ensure the fair comparison. More detailed hyperparameters and experiment settings are provided in Appendix A.5." It defers the detailed settings to an appendix, rather than providing them in the main text. |