Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Multi-Table Learning: A Novel Paradigm for Complementarity Quantification and Integration

Authors: Zhang Junyu, Lizhong Ding, MinghongZhang, Ye Yuan, Xingcan Li, Pengqi Li, Tihang Xi, Guoren Wang, Changsheng Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments show that ATCA-Net effectively leverages complementary information and that the CS metric accurately quantifies the richness of complementarity across multiple tables. To the best of our knowledge, this is the first work to establish theoretical and practical foundations for multi-table learning.
Researcher Affiliation Academia Junyu Zhang Lizhong Ding Minghong Zhang Ye Yuan Xingcan Li Pengqi Li Tihang Xi Guoren Wang Changsheng Li Beijing Institute of Technology EMAIL EMAIL, EMAIL, EMAIL
Pseudocode No The paper describes the architecture and processes using mathematical formulations and textual descriptions, but it does not include an explicit pseudocode block or algorithm section labeled as such.
Open Source Code No We will release the code following publication.
Open Datasets Yes The synthetic datasets were derived from the Open ML repository, including blastchar, airline, diabetes, and covertype, with details provided in Table 5.
Dataset Splits No The paper states it evaluates performance on classification tasks and reports the mean over five runs, and also mentions randomly sampling sub-tables for training. However, it does not explicitly provide specific train/test/validation split percentages, sample counts, or a detailed methodology for partitioning the overall datasets for evaluation.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, memory amounts, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions using BERT-based models [24] and Transformers, and refers to baselines like XGBoost [7], Saint [34], and FT-Transformer [21]. However, it does not specify any version numbers for software libraries or dependencies (e.g., Python, PyTorch, TensorFlow, Hugging Face Transformers library versions) that would be needed for replication.
Experiment Setup Yes To compute the complementarity coefficient across multiple tables, we set the hyper parameters α = 1 and γ = 1 in Equation (5). We set the Lpre combination weights λrec = 1, and λcor = 1 in Equation (9). During training stage, we randomly sample sub-tables from the original tables by selecting rows and columns, with the number of rows fixed at 64 and the number of columns varying between 2 and the maximum number of columns. The sub-tables is firstly embedded by BERT, where each cell is represented as a 768-dimensional vector, which is then reduced to 192 dimensions via a fully connected layer to support larger table inputs. These embeddings are subsequently processed by a shared-weight adaptive encoder.