RanPAC: Random Projections and Pre-trained Models for Continual Learning
Authors: Mark D. McDonnell, Dong Gong, Amin Parvaneh, Ehsan Abbasnejad, Anton van den Hengel
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Compared to previous methods applied to pre-trained Vi T-B/16 models, we reduce final error rates by between 20% and 62% on seven class-incremental benchmark datasets, despite not using any rehearsal memory. |
| Researcher Affiliation | Academia | 1Australian Institute for Machine Learning, The University of Adelaide 2School of Computer Science and Engineering, University of New South Wales |
| Pseudocode | Yes | Algorithm 1 Ran PAC Training |
| Open Source Code | Yes | Code is available at https://github.com/Ran PAC/Ran PAC. |
| Open Datasets | Yes | The seven CIL datasets we use are summmarised in Table A2. For Imagenet-A, CUB, Omnibenchmark and VTAB, we used specific train-validation splits defined and outlined in detail by [65]. Those four datasets, plus Imagenet-R (created by [56]) were downloaded from links provided at https://github.com/zhoudw-zdw/Revisiting CIL. CIFAR100 was accessed through torchvision. Stanford cars was downloaded from https://ai.stanford.edu/~jkrause/cars/car_dataset.html. |
| Dataset Splits | Yes | For each task, t, in Phase 2, the training data for that task was randomly split in the ratio 80:20. |
| Hardware Specification | Yes | All experiments were conducted on a single PC running Ubuntu 22.04.2 LTS, with 32 GB of RAM, and Intel Core i9-13900KF x32 processor. Acceleration was provided by a single NVIDIA Ge Force 4090 GPU. |
| Software Dependencies | No | The paper mentions various software components and libraries, such as "PyTorch", "torchvision", "Vi T-B/16", "Res Net", "CLIP", "Adapt Former", "SSF", and "VPT". However, it does not specify exact version numbers for these dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | For Phase 1 in Algorithm 1, we used SGD to train the parameters of PETL methods, namely Adapt Former [6], SSF [28], and VPT [21]. For each of these, we used batch sizes of 48, a learning rate of 0.01, weight decay of 0.0005, momentum of 0.9, and a cosine annealing schedule that finishes with a learning rate of 0. Generally we trained for 20 epochs, but in some experiments reduced to fewer epochs if overfitting was clear. |