Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Enhanced Representation for Tabular Data via Neighborhood Propagation

Authors: Kounianhua Du, Weinan Zhang, Ruiwen Zhou, Yangkun Wang, Xilong Zhao, Jiarui Jin, Quan Gan, Zheng Zhang, David P Wipf

NeurIPS 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on two important tabular data prediction tasks validate the superiority of the proposed PET model relative to other baselines. Additionally, we demonstrate the effectiveness of the model components and the feature enhancement ability of PET via various ablation studies and visualizations.
Researcher Affiliation	Collaboration	Kounianhua Du , Weinan Zhang , Ruiwen Zhou , Yangkun Wang , Xilong Zhao , Jiarui Jin Department of Computer Science Shanghai Jiao Tong University EMAIL Quan Gan, Zheng Zhang, David Wipf Amazon EMAIL Work done during internship at Amazon Web Services Shanghai AI Lab.
Pseudocode	No	The paper describes its methods using mathematical equations and descriptive text but does not include explicit pseudocode blocks or algorithm figures.
Open Source Code	Yes	The code is available at https://github.com/Kounianhua Du/PET.
Open Datasets	Yes	For the CTR prediction task, we conduct experiments on three large-scale datasets, i.e., Tmall3, Taobao4, and Alipay5. For the top-n recommendation task, we experiment on two widely-used public recommendation datasets, i.e., Movielens-1M6 and Last FM7. (Footnotes provide URLs: 3https://tianchi.aliyun.com/dataset/data Detail?data Id=42 4https://tianchi.aliyun.com/dataset/data Detail?data Id=649 5https://tianchi.aliyun.com/dataset/data Detail?data Id=53 6https://grouplens.org/datasets/movielens/1m/ 7http://ocelma.net/Music Recommendation Dataset/lastfm-1K.html)
Dataset Splits	No	The paper describes a temporal split for train/test pools ("The earliest data instances are grouped into the retrieval pool. The latest data instances form the test pool. Then the remaining data instances are grouped into the train pool."), but it does not explicitly detail a separate validation split with percentages or counts for hyperparameter tuning.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies	No	The paper mentions using Elastic Search but does not provide specific version numbers for any software dependencies, frameworks, or libraries used in their experiments.
Experiment Setup	No	The paper states: "As for the hyperparameters, we test the number of GNN layers in {2, 3}. The embedding sizes of all the models are consistent to ensure the fair comparison. More detailed hyperparameters and experiment settings are provided in Appendix A.5." It defers the detailed settings to an appendix, rather than providing them in the main text.