Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Learning from Offline Foundation Features with Tensor Augmentations
Authors: Emir Konuk, Christos Matsoukas, Moein Sorkhei, Phitchapha Lertsiravarameth, Kevin Smith
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate the effectiveness of LOFF-TA we benchmark over eleven datasets from various domains using different foundation models, model capacities and image resolutions. In this section we show that LOFF-TA achieves competitive, sometimes superior, results compared to the baselines while significantly reducing memory usage and training time. |
| Researcher Affiliation | Academia | 1 KTH Royal Institute of Technology, Stockholm, Sweden 2 Science for Life Laboratory, Stockholm, Sweden |
| Pseudocode | No | The paper describes its method verbally and with diagrams, but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code used in this work can be found at https://github.com/emirkonuk/loffta. |
| Open Datasets | Yes | Our evaluation spans eleven image classification datasets, covering a diverse spectrum of object categories. We include APTOS2019 [21] for diabetic retinopathy detection, DDSM [29] for identifying masses in mammography, ISIC [40, 8, 9] for skin lesion classification, AID [43] for aerial image classification, and NABirds [41] for fine-grained bird species classification. The resolution of these datasets varies, but we resize them to 512 512. We extend our evaluation to a number of standard 256 256 resolution benchmark datasets: Flowers102 [34], NABirds [41], Stanford Cars [26], Stanford Dogs [22], Oxford-III Pet [36], Caltech-101 [12], and SUN397 [44]. |
| Dataset Splits | Yes | We adhere to official train/validation/test splits when available, or follow [24] in their absence. |
| Hardware Specification | Yes | It was not possible to train Vi T-G on a single GPU with batch size of 64. Instead we report the memory footprint across 8 NVIDIA Quadro RTX 8000 using distributed training. We measure each approach in terms of performance, throughput (TP), and memory (Mem.) footprint. |
| Software Dependencies | No | The paper mentions several software components and models used (e.g., Adam W optimizer [32], Dei T-S [39], DINOv2 [35], CLIP [37], Open CLIP [19]), but it does not specify concrete version numbers for these software packages or any underlying frameworks/libraries (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | In our experiments, we utilize the Adam W optimizer [32], and a batch size of 64. We incorporate a learning rate warm-up strategy and manually decrease the learning rate by a factor of 0.1 when the validation performance plateaus. For lightweight classifier in LOFF and LOFF-TA, we implement modifications to the Dei T-S architecture [39]. We remove the patchifier from the model s stem and introduce a linear projection layer followed by a normalization layer as detailed in 3. |