Mixture of Experts Meets Prompt-Based Continual Learning
Authors: Minh Le, An Nguyen The, Huy Nguyen, Trang Nguyen, Trang Pham, Linh Ngo, Nhat Ho
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments across various continual learning benchmarks and pre-training settings demonstrate that our approach achieves state-of-the-art performance compared to existing methods. (from Introduction) and 5 Experiments section. |
| Researcher Affiliation | Collaboration | 1 The University of Texas at Austin 2 Hanoi University of Science and Technology 3 Vin AI Research |
| Pseudocode | Yes | Algorithm 1 Hi De-Prompt s training algorithm (Appendix D) |
| Open Source Code | Yes | Our code is publicly available at https://github.com/Minhchuyentoancbn/Mo E_Prompt CL. |
| Open Datasets | Yes | We evaluate various continual learning methods on widely used CIL benchmarks, including Split CIFAR-100 [23] and Split Image Net-R [23], consistent with prior work [49]. We further explore the model s performance on fine-grained classification tasks with Split CUB-200 [48] and large inter-task differences with 5-Datasets [9]. |
| Dataset Splits | No | The paper mentions 'Split CIFAR-100', 'Split Image Net-R', and 'Split CUB-200' which are common benchmarks in continual learning, and shows 'Validation loss' in Figure 3. However, it does not explicitly state the exact train/validation/test split percentages, sample counts, or detailed methodology for these splits. |
| Hardware Specification | Yes | We train and test on one NVIDIA A100 GPU for baselines and our method. |
| Software Dependencies | No | The paper states that 'Training employs an Adam optimizer (β1 = 0.9, β2 = 0.999)' and 'We leverage a pre-trained Vi T-B/16 model as the backbone', but it does not specify software dependencies with version numbers (e.g., Python version, PyTorch version, CUDA version). |
| Experiment Setup | Yes | Training employs an Adam optimizer (β1 = 0.9, β2 = 0.999), a batch size of 128, and a constant learning rate of 0.005 for all methods except CODA-Prompt. CODA-Prompt utilizes a cosine decaying learning rate starting at 0.001. Additionally, a grid search technique was implemented to determine the most appropriate number of epochs for effective training. |