LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning

Authors: Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate the effectiveness of the proposed An A framework on learning plasticity and memory stability during continual learning.
Researcher Affiliation Collaboration 1Dalian University of Technology, 2Huawei Noah s Ark Lab 3Tsinghua University, 4The Hong Kong University of Science and Technology
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Our code locates at https://github.com/Jiazuo Yu/Path Weave.
Open Datasets Yes We establish a challenging benchmark, Continual Learning on Modality (MCL), which consists of multimodal high-quality QA data to evaluate the effectiveness of our method on continual uni-modal finetuning. These datasets are collected from five distinct modalities: image, video, depth, audio and point cloud. More details of the dataset list and size for each modality are illustrated in Table A6 of the Appendix.
Dataset Splits Yes Table A7 records the detailed hyper-parameters we used during the training and testing process... Modality Iteration Batch Size (Train/Val) Learning Rate
Hardware Specification Yes We optimize our model on 4 A800 GPUs (80GB) using Adam W [53] with β1 = 0.9, β2 = 0.999, and a weight decay of 0.05.
Software Dependencies No The paper states, 'Our method is built on the LAVIS library s framework [52] atop the Vicuna v1.1 7b [3].' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes We optimize our model on 4 A800 GPUs (80GB) using Adam W [53] with β1 = 0.9, β2 = 0.999, and a weight decay of 0.05. ... Table A7 records the detailed hyper-parameters we used during the training and testing process. ... We keep all the learning rate decrease from 1e-5 and cosine annealing strategy with 0.5 decay weight. The warm-up phase starts from 1e-8 and lasts for 1000 iterations for all modality training.