Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Continuous Domain Generalization

Authors: Zekun CAI, Yiheng YAO, Guangji Bai, Renhe Jiang, Xuan Song, Ryosuke Shibasaki, Liang Zhao

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on synthetic and real-world datasets, including remote sensing, scientific documents, and traffic forecasting, demonstrate that our method significantly outperforms existing baselines in both generalization accuracy and robustness. Code is available at: https://github.com/Zekun-Cai/Neural Lio.
Researcher Affiliation	Collaboration	1Jilin University, Changchun, China 2The University of Tokyo, Tokyo, Japan 3Emory University, Atlanta, GA, USA 4Location Mind, Tokyo, Japan
Pseudocode	Yes	Algorithm 1: Training and Inference Procedure of Neural Lio
Open Source Code	Yes	Code is available at: https://github.com/Zekun-Cai/Neural Lio.
Open Datasets	Yes	Synthetic Datasets. Two synthetic datasets are employed to simulate continuous domain shifts under interpretable variations. In the 2-Moons dataset, each domain is generated by applying scaling and rotation to the base moon shape. 2. MNIST This dataset is a variant of the classic MNIST dataset [12], where each domain consists of 1,000 digit images randomly sampled from the original MNIST. 3. f Mo W The f Mo W dataset [10] consists of over 1 million high-resolution satellite images collected globally between 2002 and 2018. 4. Arxiv The ar Xiv dataset [11] comprises 1.5 million pre-prints over 28 years, spanning fields such as physics, mathematics, and computer science. 5. Year Book The Yearbook dataset [55] contains frontal portraits from U.S. high school yearbooks spanning 1930 to 2013. 6. Traffic We use a real-world taxi flow dataset [41] collected from Beijing, covering the period from February to June 2015.
Dataset Splits	Yes	1. 2-Moons ...We train on 50 randomly sampled domains, and evaluate it on 150 additional randomly sampled domains, together with extra test domains uniformly sampled over a mesh grid in the descriptor space (see Fig. 5). 2. MNIST ...We randomly sample 50 domains for training and another 50 for testing. 3. f Mo W ...We randomly select 50 domains for training and use the remaining for testing. 4. Arxiv ...We randomly select 40 domains for training and use the remaining for testing. 5. Year Book ...The first 28 domains are used for training and the remaining for testing. 6. Traffic ...We select 100 domains for training and use the remaining for testing.
Hardware Specification	Yes	All experiments are conducted on a 64-bit machine with two 20-core Intel Xeon Silver 4210R CPUs @ 2.40GHz, 378GB memory, and four NVIDIA Ge Force RTX 3090 GPUs.
Software Dependencies	No	The paper does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries used for implementation. It only mentions using the Adam optimizer.
Experiment Setup	Yes	1. 2-Moons The predictive model is a three-layer MLP with 50 hidden units per layer and Re LU activations. The encoder and decoder are both four-layer MLPs with layer dimensions [1024, 512, 128, 32]. The transport operator consists of a 32-dimensional linear field network with 2 basis matrices. The learning rate is set to 1 10 3. 2. MNIST The shared feature extractor is a convolutional backbone composed of three convolutional layers with channels [32, 32, 64], each followed by a Re LU activation and a max pooling layer with kernel size 2. ...The per-domain predictive model is a two-layer MLP with a hidden dimension of 128 and an output dimension of 10. ...The learning rate is set to 1 10 3 for all components. 3. f Mo W Res Net-50 backbone pretrained on Image Net as the shared feature extractor...The extracted features are fed into a per-domain predictive model implemented as a three-layer MLP with hidden dimensions [128, 64] and an output dimension of 10. ...The learning rate is set to 1 10 3. 4. Arxiv Each paper title is first embedded using a Sentence Transformer encoder...The per-domain predictive model is a three-layer MLP with hidden dimensions [50, 50] and output dimension 10. ...The learning rate is set to 1 10 3 for all components. 5. Year Book The shared feature extractor is a convolutional backbone composed of three convolutional layers with channels [32, 32, 64]...The per-domain predictive model is a three-layer MLP with hidden dimensions [128, 32] and output dimension 2. ...The learning rate is set to 1 10 3. 6. Traffic The predictive model is a three-layer MLP that takes as input a flattened 96dimensional vector representing 48 historical inflow and outflow pairs, and outputs a 6dimensional vector corresponding to 3-step future predictions. The hidden dimension is set to 64. ...The learning rate is set to 1 10 3 for all components.