Fast Trainable Projection for Robust Fine-tuning

Authors: Junjiao Tian, Yen-Cheng Liu, James S Smith, Zsolt Kira

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we show superior robustness on OOD datasets, including domain shifts and natural corruptions, across four different vision tasks with five different pre-trained models. Additionally, we demonstrate that FTP is broadly applicable and beneficial to other learning scenarios such as low-label and continual learning settings thanks to its easy adaptability. We show superior robustness on OOD datasets on four vision tasks with five pre-trained models and SOTA performance on a continual learning benchmark, all with a 35% speedup in Sec. 4.
Researcher Affiliation Academia Junjiao Tian Georgia Institute of Technology jtian73@gateatech.edu Yen-Cheng Liu Georgia Institute of Technology ycliu@gatech.edu James Seale Smith Georgia Institute of Technology jamessealesmith@gatech.edu Zsolt Kira Georgia Institute of Technology zkira@gatech.edu
Pseudocode Yes Algorithm 1 FTP: Fast Trainable Projection. ... Algorithm 2 Adamp Update: Adam Update implements one step update of Adam [38]
Open Source Code Yes The code will be available at https://github.com/GT-RIPL/FTP.git.
Open Datasets Yes For the Domain Net experiment (image classification), which consists of five domains, Real, Sketch, Painting, Infographics, and Clipart, we follow the setup of the prior work [10] and use its released code to train FTP. ... For Image Net experiments (Tab. 3, Fig. 3), we use a CLIP pre-trained Vi T-Base [4]. ... To further demonstrate the effectiveness of FTP in more diverse scenarios, we test it on PASCALContext [49]. ... we partition Image Net-R (200 classes) into 10 sequential tasks
Dataset Splits Yes We use the Real domain as the ID training dataset and the rest as OOD testing datasets. ... We sweep a range of learning rates and use the validation split to determine the best learning rate for FTP for each experiment.
Hardware Specification Yes Every Domain Net experiment was conducted using 4 RTX 2080 GPUs. ... Every Image Net classification experiment was conducted on 2 A40 GPUs. ... Every PASCAL experiment was conducted on a single RTX 2080 GPU. ... Every CL experiment was conducted on 4 RTX2080 GPUs.
Software Dependencies No The paper mentions using 'SGD as the base optimizer', 'Adam W', 'Adam', and 'Pytorch code example' but does not specify version numbers for PyTorch or any other software libraries or frameworks. It references papers for optimizers, but not the specific software versions used.
Experiment Setup Yes For FTP, we only tuned the learning rate while keeping the other hyper-parameters fixed as in the prior work. ... We train models for 50 and 150 epochs respectively with a batch size of 256. We sweep a range of learning rates and use the validation split to determine the best learning rate for FTP for each experiment. ... we use weight-decay (0.1), drop-path (0.2) [40], label-smoothing (0.1) [41], Mixup (0.8) [42] and Cutmix (1.0) [43]. ... We train all methods using Adam W [37] as the base optimizer with a weight decay of 0.1, cosine learning rate schedule, and a batch size of 256 for 30 epochs.