reproducibilityindex.ai

Mixture of Experts Meets Prompt-Based Continual Learning

Authors: Minh Le, An Nguyen The, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Ngo, Nhat Ho

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across various continual learning benchmarks and pre-training settings demonstrate that our approach achieves state-of-the-art performance compared to existing methods. (from Introduction) and 5 Experiments section.
Researcher Affiliation	Collaboration	1 The University of Texas at Austin 2 Hanoi University of Science and Technology 3 Vin AI Research
Pseudocode	Yes	Algorithm 1 Hi De-Prompt s training algorithm (Appendix D)
Open Source Code	Yes	Our code is publicly available at https://github.com/Minhchuyentoancbn/Mo E_Prompt CL.
Open Datasets	Yes	We evaluate various continual learning methods on widely used CIL benchmarks, including Split CIFAR-100 [23] and Split Image Net-R [23], consistent with prior work [49]. We further explore the model s performance on fine-grained classification tasks with Split CUB-200 [48] and large inter-task differences with 5-Datasets [9].
Dataset Splits	No	The paper mentions 'Split CIFAR-100', 'Split Image Net-R', and 'Split CUB-200' which are common benchmarks in continual learning, and shows 'Validation loss' in Figure 3. However, it does not explicitly state the exact train/validation/test split percentages, sample counts, or detailed methodology for these splits.
Hardware Specification	Yes	We train and test on one NVIDIA A100 GPU for baselines and our method.
Software Dependencies	No	The paper states that 'Training employs an Adam optimizer (β1 = 0.9, β2 = 0.999)' and 'We leverage a pre-trained Vi T-B/16 model as the backbone', but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, CUDA version).
Experiment Setup	Yes	Training employs an Adam optimizer (β1 = 0.9, β2 = 0.999), a batch size of 128, and a constant learning rate of 0.005 for all methods except CODA-Prompt. CODA-Prompt utilizes a cosine decaying learning rate starting at 0.001. Additionally, a grid search technique was implemented to determine the most appropriate number of epochs for effective training.