Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Mixture of Experts Meets Prompt-Based Continual Learning
Authors: Minh Le, An Nguyen The, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Ngo, Nhat Ho
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various continual learning benchmarks and pre-training settings demonstrate that our approach achieves state-of-the-art performance compared to existing methods. (from Introduction) and 5 Experiments section. |
| Researcher Affiliation | Collaboration | 1 The University of Texas at Austin 2 Hanoi University of Science and Technology 3 Vin AI Research |
| Pseudocode | Yes | Algorithm 1 Hi De-Prompt s training algorithm (Appendix D) |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Minhchuyentoancbn/Mo E_Prompt CL. |
| Open Datasets | Yes | We evaluate various continual learning methods on widely used CIL benchmarks, including Split CIFAR-100 [23] and Split Image Net-R [23], consistent with prior work [49]. We further explore the model s performance on fine-grained classification tasks with Split CUB-200 [48] and large inter-task differences with 5-Datasets [9]. |
| Dataset Splits | No | The paper mentions 'Split CIFAR-100', 'Split Image Net-R', and 'Split CUB-200' which are common benchmarks in continual learning, and shows 'Validation loss' in Figure 3. However, it does not explicitly state the exact train/validation/test split percentages, sample counts, or detailed methodology for these splits. |
| Hardware Specification | Yes | We train and test on one NVIDIA A100 GPU for baselines and our method. |
| Software Dependencies | No | The paper states that 'Training employs an Adam optimizer (β1 = 0.9, β2 = 0.999)' and 'We leverage a pre-trained Vi T-B/16 model as the backbone', but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, CUDA version). |
| Experiment Setup | Yes | Training employs an Adam optimizer (β1 = 0.9, β2 = 0.999), a batch size of 128, and a constant learning rate of 0.005 for all methods except CODA-Prompt. CODA-Prompt utilizes a cosine decaying learning rate starting at 0.001. Additionally, a grid search technique was implemented to determine the most appropriate number of epochs for effective training. |