RanPAC: Random Projections and Pre-trained Models for Continual Learning

Authors: Mark D. McDonnell, Dong Gong, Amin Parvaneh, Ehsan Abbasnejad, Anton van den Hengel

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared to previous methods applied to pre-trained Vi T-B/16 models, we reduce final error rates by between 20% and 62% on seven class-incremental benchmark datasets, despite not using any rehearsal memory.
Researcher Affiliation Academia 1Australian Institute for Machine Learning, The University of Adelaide 2School of Computer Science and Engineering, University of New South Wales
Pseudocode Yes Algorithm 1 Ran PAC Training
Open Source Code Yes Code is available at https://github.com/Ran PAC/Ran PAC.
Open Datasets Yes The seven CIL datasets we use are summmarised in Table A2. For Imagenet-A, CUB, Omnibenchmark and VTAB, we used specific train-validation splits defined and outlined in detail by [65]. Those four datasets, plus Imagenet-R (created by [56]) were downloaded from links provided at https://github.com/zhoudw-zdw/Revisiting CIL. CIFAR100 was accessed through torchvision. Stanford cars was downloaded from https://ai.stanford.edu/~jkrause/cars/car_dataset.html.
Dataset Splits Yes For each task, t, in Phase 2, the training data for that task was randomly split in the ratio 80:20.
Hardware Specification Yes All experiments were conducted on a single PC running Ubuntu 22.04.2 LTS, with 32 GB of RAM, and Intel Core i9-13900KF x32 processor. Acceleration was provided by a single NVIDIA Ge Force 4090 GPU.
Software Dependencies No The paper mentions various software components and libraries, such as "PyTorch", "torchvision", "Vi T-B/16", "Res Net", "CLIP", "Adapt Former", "SSF", and "VPT". However, it does not specify exact version numbers for these dependencies, which is required for reproducibility.
Experiment Setup Yes For Phase 1 in Algorithm 1, we used SGD to train the parameters of PETL methods, namely Adapt Former [6], SSF [28], and VPT [21]. For each of these, we used batch sizes of 48, a learning rate of 0.01, weight decay of 0.0005, momentum of 0.9, and a cosine annealing schedule that finishes with a learning rate of 0. Generally we trained for 20 epochs, but in some experiments reduced to fewer epochs if overfitting was clear.