Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Learning Robust Spectral Dynamics for Temporal Domain Generalization

Authors: En Yu, Jie Lu, Xiaoyu Yang, Guangquan Zhang, Zhen Fang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate Fre Koo s significant superiority over state-of-the-art TDG methods, particularly excelling in real-world streaming scenarios with complex drifts and uncertainties.
Researcher Affiliation	Academia	Australian Artificial Intelligence Institute (AAII), University of Technology Sydney, Australia. EMAIL; EMAIL
Pseudocode	Yes	Algorithm 1 Fre Koo End-to-End Learning Procedure
Open Source Code	Yes	The code is available at https://github.com/isenyu/Fre Koo.
Open Datasets	Yes	Following [8], we evaluate Fre Koo on seven datasets with various drift types. In classification, the synthetic Rotated-Moons and Rot-MNIST benchmarks create incremental drift through steadily increasing rotation angles, whereas the real-world streams ONP, Shuttle, and Elec2 exhibit incremental, periodic or unknown drifts, respectively. For regression, House Prices and Appliance Energy also reflect a real-world non-stationary. These diverse datasets, featuring various drifts and real-world uncertainties, form a comprehensive test bed (details in Appendix C.1). C.1 Datasets Rotated 2 Moons. Rotated MNIST. Online News Popularity (ONP)2. This dataset aggregates heterogeneous features of articles published by Mashable over two years, aiming to predict social media shares (popularity). It comprises 39,797 samples with 58 features, where concept drift is characterized by temporal shifts in popularity patterns. We partition the data into 6 time-ordered domains, using the first five for training and the last for testing. 2https://archive.ics.uci.edu/dataset/332/online+news+popularity Shuttle3. The Shuttle dataset contains 58,000 instances of multi-class flight status classification under severe class imbalance. 3https://archive.ics.uci.edu/dataset/148/statlog+shuttle Electrical Demand4. This dataset records electricity demand in a province, addressing a binary classification task to predict whether 30-minute demand exceeds or falls below the daily average for that time period. 4https://web.archive.org/web/20191121102533/http://www.inescporto.pt/ jgama/ales/ales_5.html House Prices5. This dataset comprises housing price records from 2013 to 2019 for the regression task to predict property prices based on feature values. 5https://www.kaggle.com/datasets/htagholdings/property-sales Appliances Energy Prediction6. This dataset addresses regression modeling for predicting appliance energy consumption in a low-energy building. 6https://archive.ics.uci.edu/dataset/374/appliances+energy+prediction
Dataset Splits	Yes	Rotated 2 Moons. ...divided into 10 sequential domains. Each domain is rotated 18 counter-clockwise relative to the previous one. We train on domains 0-8 and test on domain 9... Rotated MNIST. ...constructed a total of five domains... The first four rotated domains are used for training, while the fifth domain serves as the test set... Online News Popularity (ONP)2. ...We partition the data into 6 time-ordered domains, using the first five for training and the last for testing. Shuttle3. ...partitioned into 8 time-stamped domains using a chronological split: domains spanning timestamps 30-70 serve as training data, while the most recent period (70 80) is reserved for testing. Electrical Demand4. ...It is partitioned into 30 two-week chronological domains, with the first 29 used for training and the 30th for testing. House Prices5. ...We treat each calendar year as a distinct domain, using 2013 2018 data for training and the final (2019) domain for testing. Appliances Energy Prediction6. ...partitioned into 9 chronological domains. We train on the first eight domains and evaluate on the final (most recent) ninth domain...
Hardware Specification	Yes	All experiments were conducted on a server with 187GB of memory, an Intel(R) Xeon(R) Gold 6226R CPU@2.90GHz, and two A100 GPUs.
Software Dependencies	No	We adopt the Adam optimizer across all datasets, with distinct learning rates for the prediction module lrpre, encoder-decoder module lrco, and Koopman module lrko.
Experiment Setup	Yes	The architecture and implementation of backbones and prediction models for all datasets align with DRAIN [8]. Specially, both the encoders and decoder employ a 4-layer MLP architecture with layer dimensions [1024, 512, 128, m], where m = 32 denotes the dimension of the Koopman operator. All experiments were conducted on a server with 187GB of memory, an Intel(R) Xeon(R) Gold 6226R CPU@2.90GHz, and two A100 GPUs. We adopt the Adam optimizer across all datasets, with distinct learning rates for the prediction module lrpre, encoder-decoder module lrco, and Koopman module lrko. For the 2-Moons dataset, the coder and Koopman learning rates are set to lrco = 1 10 3 and prediction learning rate lrpre = 1 10 2, regulated by τ = 0.9, α = 10, β = γ = 1. The Rot-MNIST configuration retains lrpre = lrco = lrko = 1 10 3 and τ = 0.9, α = 0.1, β = γ = 1. For ONP, we use lrco = 1 10 2 for coder/Koopman and lrpre = 1 10 3 for prediction, combined with τ = 0.8, α = 0.1, β = 1, γ = 0.01. The Shuttle dataset employs a uniform learning rate 1 10 3 for all modules, and τ = 0.9, α = β = γ = 1 . For Elec2, lrpre = 1 10 2, lrco = 1 10 4 and lrko = 1 10 3 governed by τ = 0.1, α = 10, β = 0.1, γ = 1. The House dataset shares the coder/Koopman learning rate 1 10 3 with prediction rate 1 10 2 with τ = 0.3, α = 0.1, β = 10, γ = 1. Finally, Appliance maintains a uniform learning rate 1 10 3 across all modules with τ = 0.8, α = 1, β = 1, γ = 100.