Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Expert Routing with Synthetic Data for Domain Incremental Learning

Authors: Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary Chase Lipton

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We observe that G2D outperforms competitive domainincremental learning methods on tasks in both vision and language modalities, providing a new perspective on the use of synthetic data in the lifelong learning literature. [...] We observe that G2D outperforms competitive methods in the considered vision and text benchmarks: Domain Net, CORe50, Derm CL, and Question Answering (Table 1, 2).
Researcher Affiliation	Academia	Yewon Byun1, Sanket Vaibhav Mehta1, Saurabh Garg1, Emma Strubell1, Michael Oberst2, Bryan Wilder1, Zachary C. Lipton1 1Carnegie Mellon University 2Johns Hopkins University
Pseudocode	Yes	Algorithm 1: Generate to Discriminate (G2D)
Open Source Code	No	The text does not explicitly provide a statement about releasing source code for the methodology described in this paper, nor does it provide a direct link to a code repository.
Open Datasets	Yes	For our vision experiments, we look at standard domain-incremental benchmarks: Domain Net (Peng et al., 2019) and CORe50 (Lomonaco & Maltoni, 2017), and Derm CL (see 5.1), our newly introduced benchmark curated from real-world dermatology tasks (Tschandl et al., 2018; Cassidy et al., 2022; Pacheco et al., 2020; Daneshjou et al., 2022). [...] All four datasets in the sequence are publicly available at https://github.com/rajpurkarlab/Bench MD Wantlin et al. (2023) (for HAM10000, BCN2000, PAD-UEFS-20) and https://ddi-dataset.github.io Daneshjou et al. (2022) (for DDI).
Dataset Splits	Yes	Table 4: Per-domain statistics for the domain-incremental learning benchmarks used in both vision and text experiments. Dataset Domain Train Validation Test
Hardware Specification	Yes	All experiments were conducted using NVIDIA RTX A6000 and NVIDIA RTX 6000Ada graphics cards.
Software Dependencies	Yes	For our generator for the text domain, we use a pretrained T5-Large v1.1 model (Raffel et al., 2020) and optimize via prompt tuning (Lester et al., 2021), which learns continuous input token embeddings (see B.1 and C.2 for further details).
Experiment Setup	Yes	Section C Hyperparameter Details, Table 7: Finetuning hyperparameters for the generative models in our vision experiments. Table 8: Finetuning hyperparameters for the classifiers in our vision experiments. Table 9: Finetuning hyperparameters for the generative models in our text experiments. Table 10: Finetuning hyperparameters for the classifiers in our text experiments.