Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Feature-aware Modulation for Learning from Temporal Tabular Data

Authors: Haorun Cai, Han-Jia Ye

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Benchmark evaluations validate the effectiveness of our method in handling temporal shifts in tabular data. ... Experiments demonstrate the effectiveness of our approach in handling temporal shifts across real-world datasets. ... We conduct experiments on the Tab Re D benchmark proposed by Rubachev et al. [41] ... Our main results on the Tab Re D benchmark [41] are summarized in Table 1. ... The ablation study in Section 6.4 and Table 2 further confirms the effectiveness of applying modulation at multiple levels of the MLP.
Researcher Affiliation	Academia	Hao-Run Cai Han-Jia Ye School of Artificial Intelligence, Nanjing University, China National Key Laboratory for Novel Software Technology, Nanjing University, China EMAIL
Pseudocode	No	The paper describes the 'Feature-aware Temporal Modulation' mechanism with a mathematical formula (Equation 2) and a figure (Figure 3) illustrating the framework. However, it does not present a formal 'Algorithm' or 'Pseudocode' block.
Open Source Code	Yes	We release the complete implementation of our method at the following repository: https://github.com/LAMDA-Tabular/Tabular-Temporal-Modulation.
Open Datasets	Yes	We use the Tab Re D [41] benchmark to evaluate the performance of our models. Furthermore, we adopt the refined training protocol and the data preprocessing procedures proposed by Cai & Ye [6].
Dataset Splits	Yes	Given the absence of a standardized training protocol for temporal tabular data, we further conduct a comparative analysis of model performance under different training setups. Table 7 reports the results of each method when trained and evaluated on randomly split training and validation sets, instead of the temporal training protocol proposed by Cai & Ye [6]. In addition, Table 8 presents the results obtained under the original protocol proposed by Rubachev et al. [41].
Hardware Specification	Yes	All deep learning methods were trained on 20 NVIDIA RTX 4090 (24 GB) GPUs. Classical machine learning methods were executed on 4 Intel Xeon Platinum 8352S CPUs.
Software Dependencies	Yes	Our experiments are run under Linux using Python 3.10 and Py Torch 2.0.1.
Experiment Setup	Yes	Our preprocessing, training, evaluation, and hyperparameter tuning setup follows the practices established in Cai & Ye [6]. Detailed experimental setup is provided in Section B. ... Hyperparameter optimization is performed using Optuna [1], with 100 trials for most methods. Due to computational constraints, FT-Transformer and Tab R are tuned with 25 trials. The search space strictly follows the configurations used in Cai & Ye [6] and Rubachev et al. [41], and is also documented in our source code (available in the config/ folder). ... For all deep learning methods, we use a batch size of 1024 and the Adam W optimizer [28]. ... Model selection is based on the best performance on the validation set. Following Rubachev et al. [41] and Cai & Ye [6], we adopt an early stopping strategy with a patience of 16 epochs based on validation performance.