Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Expert Routing with Synthetic Data for Domain Incremental Learning
Authors: Yewon Byun, Sanket Vaibhav Mehta, Saurabh Garg, Emma Strubell, Michael Oberst, Bryan Wilder, Zachary Chase Lipton
TMLR 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We observe that G2D outperforms competitive domainincremental learning methods on tasks in both vision and language modalities, providing a new perspective on the use of synthetic data in the lifelong learning literature. [...] We observe that G2D outperforms competitive methods in the considered vision and text benchmarks: Domain Net, CORe50, Derm CL, and Question Answering (Table 1, 2). |
| Researcher Affiliation | Academia | Yewon Byun1, Sanket Vaibhav Mehta1, Saurabh Garg1, Emma Strubell1, Michael Oberst2, Bryan Wilder1, Zachary C. Lipton1 1Carnegie Mellon University 2Johns Hopkins University |
| Pseudocode | Yes | Algorithm 1: Generate to Discriminate (G2D) |
| Open Source Code | No | The text does not explicitly provide a statement about releasing source code for the methodology described in this paper, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | For our vision experiments, we look at standard domain-incremental benchmarks: Domain Net (Peng et al., 2019) and CORe50 (Lomonaco & Maltoni, 2017), and Derm CL (see 5.1), our newly introduced benchmark curated from real-world dermatology tasks (Tschandl et al., 2018; Cassidy et al., 2022; Pacheco et al., 2020; Daneshjou et al., 2022). [...] All four datasets in the sequence are publicly available at https://github.com/rajpurkarlab/Bench MD Wantlin et al. (2023) (for HAM10000, BCN2000, PAD-UEFS-20) and https://ddi-dataset.github.io Daneshjou et al. (2022) (for DDI). |
| Dataset Splits | Yes | Table 4: Per-domain statistics for the domain-incremental learning benchmarks used in both vision and text experiments. Dataset Domain Train Validation Test |
| Hardware Specification | Yes | All experiments were conducted using NVIDIA RTX A6000 and NVIDIA RTX 6000Ada graphics cards. |
| Software Dependencies | Yes | For our generator for the text domain, we use a pretrained T5-Large v1.1 model (Raffel et al., 2020) and optimize via prompt tuning (Lester et al., 2021), which learns continuous input token embeddings (see B.1 and C.2 for further details). |
| Experiment Setup | Yes | Section C Hyperparameter Details, Table 7: Finetuning hyperparameters for the generative models in our vision experiments. Table 8: Finetuning hyperparameters for the classifiers in our vision experiments. Table 9: Finetuning hyperparameters for the generative models in our text experiments. Table 10: Finetuning hyperparameters for the classifiers in our text experiments. |