Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Expert Routing with Synthetic Data for Domain Incremental Learning

Authors: Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary Chase Lipton

TMLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We observe that G2D outperforms competitive domainincremental learning methods on tasks in both vision and language modalities, providing a new perspective on the use of synthetic data in the lifelong learning literature. [...] We observe that G2D outperforms competitive methods in the considered vision and text benchmarks: Domain Net, CORe50, Derm CL, and Question Answering (Table 1, 2).
Researcher Affiliation Academia Yewon Byun1, Sanket Vaibhav Mehta1, Saurabh Garg1, Emma Strubell1, Michael Oberst2, Bryan Wilder1, Zachary C. Lipton1 1Carnegie Mellon University 2Johns Hopkins University
Pseudocode Yes Algorithm 1: Generate to Discriminate (G2D)
Open Source Code No The text does not explicitly provide a statement about releasing source code for the methodology described in this paper, nor does it provide a direct link to a code repository.
Open Datasets Yes For our vision experiments, we look at standard domain-incremental benchmarks: Domain Net (Peng et al., 2019) and CORe50 (Lomonaco & Maltoni, 2017), and Derm CL (see 5.1), our newly introduced benchmark curated from real-world dermatology tasks (Tschandl et al., 2018; Cassidy et al., 2022; Pacheco et al., 2020; Daneshjou et al., 2022). [...] All four datasets in the sequence are publicly available at https://github.com/rajpurkarlab/Bench MD Wantlin et al. (2023) (for HAM10000, BCN2000, PAD-UEFS-20) and https://ddi-dataset.github.io Daneshjou et al. (2022) (for DDI).
Dataset Splits Yes Table 4: Per-domain statistics for the domain-incremental learning benchmarks used in both vision and text experiments. Dataset Domain Train Validation Test
Hardware Specification Yes All experiments were conducted using NVIDIA RTX A6000 and NVIDIA RTX 6000Ada graphics cards.
Software Dependencies Yes For our generator for the text domain, we use a pretrained T5-Large v1.1 model (Raffel et al., 2020) and optimize via prompt tuning (Lester et al., 2021), which learns continuous input token embeddings (see B.1 and C.2 for further details).
Experiment Setup Yes Section C Hyperparameter Details, Table 7: Finetuning hyperparameters for the generative models in our vision experiments. Table 8: Finetuning hyperparameters for the classifiers in our vision experiments. Table 9: Finetuning hyperparameters for the generative models in our text experiments. Table 10: Finetuning hyperparameters for the classifiers in our text experiments.