Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Feature-aware Modulation for Learning from Temporal Tabular Data

Authors: Haorun Cai, Han-Jia Ye

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Benchmark evaluations validate the effectiveness of our method in handling temporal shifts in tabular data. ... Experiments demonstrate the effectiveness of our approach in handling temporal shifts across real-world datasets. ... We conduct experiments on the Tab Re D benchmark proposed by Rubachev et al. [41] ... Our main results on the Tab Re D benchmark [41] are summarized in Table 1. ... The ablation study in Section 6.4 and Table 2 further confirms the effectiveness of applying modulation at multiple levels of the MLP.
Researcher Affiliation Academia Hao-Run Cai Han-Jia Ye School of Artificial Intelligence, Nanjing University, China National Key Laboratory for Novel Software Technology, Nanjing University, China EMAIL
Pseudocode No The paper describes the 'Feature-aware Temporal Modulation' mechanism with a mathematical formula (Equation 2) and a figure (Figure 3) illustrating the framework. However, it does not present a formal 'Algorithm' or 'Pseudocode' block.
Open Source Code Yes We release the complete implementation of our method at the following repository: https://github.com/LAMDA-Tabular/Tabular-Temporal-Modulation.
Open Datasets Yes We use the Tab Re D [41] benchmark to evaluate the performance of our models. Furthermore, we adopt the refined training protocol and the data preprocessing procedures proposed by Cai & Ye [6].
Dataset Splits Yes Given the absence of a standardized training protocol for temporal tabular data, we further conduct a comparative analysis of model performance under different training setups. Table 7 reports the results of each method when trained and evaluated on randomly split training and validation sets, instead of the temporal training protocol proposed by Cai & Ye [6]. In addition, Table 8 presents the results obtained under the original protocol proposed by Rubachev et al. [41].
Hardware Specification Yes All deep learning methods were trained on 20 NVIDIA RTX 4090 (24 GB) GPUs. Classical machine learning methods were executed on 4 Intel Xeon Platinum 8352S CPUs.
Software Dependencies Yes Our experiments are run under Linux using Python 3.10 and Py Torch 2.0.1.
Experiment Setup Yes Our preprocessing, training, evaluation, and hyperparameter tuning setup follows the practices established in Cai & Ye [6]. Detailed experimental setup is provided in Section B. ... Hyperparameter optimization is performed using Optuna [1], with 100 trials for most methods. Due to computational constraints, FT-Transformer and Tab R are tuned with 25 trials. The search space strictly follows the configurations used in Cai & Ye [6] and Rubachev et al. [41], and is also documented in our source code (available in the config/ folder). ... For all deep learning methods, we use a batch size of 1024 and the Adam W optimizer [28]. ... Model selection is based on the best performance on the validation set. Following Rubachev et al. [41] and Cai & Ye [6], we adopt an early stopping strategy with a patience of 16 epochs based on validation performance.