Learning from Offline Foundation Features with Tensor Augmentations
Authors: Emir Konuk, Christos Matsoukas, Moein Sorkhei, Phitchapha Lertsiravarameth, Kevin Smith
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of LOFF-TA we benchmark over eleven datasets from various domains using different foundation models, model capacities and image resolutions. In this section we show that LOFF-TA achieves competitive, sometimes superior, results compared to the baselines while significantly reducing memory usage and training time. |
| Researcher Affiliation | Academia | 1 KTH Royal Institute of Technology, Stockholm, Sweden 2 Science for Life Laboratory, Stockholm, Sweden |
| Pseudocode | No | The paper describes its method verbally and with diagrams, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code used in this work can be found at https://github.com/emirkonuk/loffta. |
| Open Datasets | Yes | Our evaluation spans eleven image classification datasets, covering a diverse spectrum of object categories. We include APTOS2019 [21] for diabetic retinopathy detection, DDSM [29] for identifying masses in mammography, ISIC [40, 8, 9] for skin lesion classification, AID [43] for aerial image classification, and NABirds [41] for fine-grained bird species classification. The resolution of these datasets varies, but we resize them to 512 512. We extend our evaluation to a number of standard 256 256 resolution benchmark datasets: Flowers102 [34], NABirds [41], Stanford Cars [26], Stanford Dogs [22], Oxford-III Pet [36], Caltech-101 [12], and SUN397 [44]. |
| Dataset Splits | Yes | We adhere to official train/validation/test splits when available, or follow [24] in their absence. |
| Hardware Specification | Yes | It was not possible to train Vi T-G on a single GPU with batch size of 64. Instead we report the memory footprint across 8 NVIDIA Quadro RTX 8000 using distributed training. We measure each approach in terms of performance, throughput (TP), and memory (Mem.) footprint. |
| Software Dependencies | No | The paper mentions several software components and models used (e.g., Adam W optimizer [32], Dei T-S [39], DINOv2 [35], CLIP [37], Open CLIP [19]), but it does not specify concrete version numbers for these software packages or any underlying frameworks/libraries (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | In our experiments, we utilize the Adam W optimizer [32], and a batch size of 64. We incorporate a learning rate warm-up strategy and manually decrease the learning rate by a factor of 0.1 when the validation performance plateaus. For lightweight classifier in LOFF and LOFF-TA, we implement modifications to the Dei T-S architecture [39]. We remove the patchifier from the model s stem and introduce a linear projection layer followed by a normalization layer as detailed in 3. |