CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models
Authors: Saurav Jha, Dong Gong, Lina Yao
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method on CIFAR100 [3, 30], Image Net100 [41, 43], Image Net-R [59], CUB200 [60], and VTAB [60]. |
| Researcher Affiliation | Collaboration | Saurav Jha1, Dong Gong1 , Lina Yao1,2 1University of New South Wales (UNSW Sydney), 2CSIRO s Data61 {saurav.jha, dong.gong}@unsw.edu.au; lina.yao@data61.csiro.au |
| Pseudocode | Yes | Algorithm 1: A forward CLAP4CLIP pass at test step t |
| Open Source Code | Yes | Our code is available at https://github.com/srv Codes/clap4clip. |
| Open Datasets | Yes | We evaluate our method on CIFAR100 [3, 30], Image Net100 [41, 43], Image Net-R [59], CUB200 [60], and VTAB [60]. |
| Dataset Splits | Yes | we tuned our hyperparameters using a validation set comprising 10% of the CIFAR-100 training dataset. |
| Hardware Specification | Yes | All our experiments were performed on NVIDIA V100 GPUs hosted on the Gadi supercomputers of the National Computational Infrastructure (NCI Australia). |
| Software Dependencies | No | The paper mentions using SGD and refers to models like CLIP, but does not provide specific version numbers for software dependencies such as Python, PyTorch, or other libraries. |
| Experiment Setup | Yes | We train CLAP and its variants using SGD, with a batch size of 64, for 5 epochs, including 1 epoch of linear warmup. The initial learning rate (LR) is set to 1e-3 and decays with cosine annealing. At the end of each incremental task (t > 1), we perform memory consolidation training for 2 epochs, with an LR of 1e-4, on the class-balanced memory dataset. |