Learning from Offline Foundation Features with Tensor Augmentations

Authors: Emir Konuk, Christos Matsoukas, Moein Sorkhei, Phitchapha Lertsiravarameth, Kevin Smith

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate the effectiveness of LOFF-TA we benchmark over eleven datasets from various domains using different foundation models, model capacities and image resolutions. In this section we show that LOFF-TA achieves competitive, sometimes superior, results compared to the baselines while significantly reducing memory usage and training time.
Researcher Affiliation Academia 1 KTH Royal Institute of Technology, Stockholm, Sweden 2 Science for Life Laboratory, Stockholm, Sweden
Pseudocode No The paper describes its method verbally and with diagrams, but does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes The source code used in this work can be found at https://github.com/emirkonuk/loffta.
Open Datasets Yes Our evaluation spans eleven image classification datasets, covering a diverse spectrum of object categories. We include APTOS2019 [21] for diabetic retinopathy detection, DDSM [29] for identifying masses in mammography, ISIC [40, 8, 9] for skin lesion classification, AID [43] for aerial image classification, and NABirds [41] for fine-grained bird species classification. The resolution of these datasets varies, but we resize them to 512 512. We extend our evaluation to a number of standard 256 256 resolution benchmark datasets: Flowers102 [34], NABirds [41], Stanford Cars [26], Stanford Dogs [22], Oxford-III Pet [36], Caltech-101 [12], and SUN397 [44].
Dataset Splits Yes We adhere to official train/validation/test splits when available, or follow [24] in their absence.
Hardware Specification Yes It was not possible to train Vi T-G on a single GPU with batch size of 64. Instead we report the memory footprint across 8 NVIDIA Quadro RTX 8000 using distributed training. We measure each approach in terms of performance, throughput (TP), and memory (Mem.) footprint.
Software Dependencies No The paper mentions several software components and models used (e.g., Adam W optimizer [32], Dei T-S [39], DINOv2 [35], CLIP [37], Open CLIP [19]), but it does not specify concrete version numbers for these software packages or any underlying frameworks/libraries (e.g., PyTorch, TensorFlow).
Experiment Setup Yes In our experiments, we utilize the Adam W optimizer [32], and a batch size of 64. We incorporate a learning rate warm-up strategy and manually decrease the learning rate by a factor of 0.1 when the validation performance plateaus. For lightweight classifier in LOFF and LOFF-TA, we implement modifications to the Dei T-S architecture [39]. We remove the patchifier from the model s stem and introduce a linear projection layer followed by a normalization layer as detailed in 3.