LLMs Can Evolve Continually on Modality for $\mathbb{X}$-Modal Reasoning
Authors: Jiazuo Yu, Haomiao Xiong, Lu Zhang, Haiwen Diao, Yunzhi Zhuge, Lanqing Hong, Dong Wang, Huchuan Lu, You He, Long Chen
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate the effectiveness of the proposed An A framework on learning plasticity and memory stability during continual learning. |
| Researcher Affiliation | Collaboration | 1Dalian University of Technology, 2Huawei Noah s Ark Lab 3Tsinghua University, 4The Hong Kong University of Science and Technology |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code locates at https://github.com/Jiazuo Yu/Path Weave. |
| Open Datasets | Yes | We establish a challenging benchmark, Continual Learning on Modality (MCL), which consists of multimodal high-quality QA data to evaluate the effectiveness of our method on continual uni-modal finetuning. These datasets are collected from five distinct modalities: image, video, depth, audio and point cloud. More details of the dataset list and size for each modality are illustrated in Table A6 of the Appendix. |
| Dataset Splits | Yes | Table A7 records the detailed hyper-parameters we used during the training and testing process... Modality Iteration Batch Size (Train/Val) Learning Rate |
| Hardware Specification | Yes | We optimize our model on 4 A800 GPUs (80GB) using Adam W [53] with β1 = 0.9, β2 = 0.999, and a weight decay of 0.05. |
| Software Dependencies | No | The paper states, 'Our method is built on the LAVIS library s framework [52] atop the Vicuna v1.1 7b [3].' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | We optimize our model on 4 A800 GPUs (80GB) using Adam W [53] with β1 = 0.9, β2 = 0.999, and a weight decay of 0.05. ... Table A7 records the detailed hyper-parameters we used during the training and testing process. ... We keep all the learning rate decrease from 1e-5 and cosine annealing strategy with 0.5 decay weight. The warm-up phase starts from 1e-8 and lasts for 1000 iterations for all modality training. |