LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections

Authors: Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Horst Possegger, Mateusz Kozinski, Rogerio Feris, Horst Bischof

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We extensively evaluate our approach on 12 different datasets belonging to widely different domains. More specifically, we use four datasets containing common natural categories: Image Net [53], CIFAR-10/100 [55] and Caltech-101 [56].
Researcher Affiliation Collaboration 1Institute of Computer Graphics and Vision, TU Graz, Austria. 2Christian Doppler Laboratory for Embedded Machine Learning. 3MIT-IBM Watson AI Lab, USA.
Pseudocode No The paper includes diagrams (e.g., Figure 1, Figure 2) to illustrate the proposed method but does not provide formal pseudocode blocks or algorithms.
Open Source Code No The paper provides a 'Project Page: https://jmiemirza.github.io/La FTer/'. While project pages often contain links to code, this is not an explicit statement of code release for the methodology, nor is it a direct link to a source-code repository.
Open Datasets Yes More specifically, we use four datasets containing common natural categories: Image Net [53], CIFAR-10/100 [55] and Caltech-101 [56]. Euro Sat [57] contains satellite images of 10 different locations. UCF-101 [58] is an action recognition dataset. SUN-397 [59] contains images from 397 naturally occuring scenes. Flowers-102 [60] is a fine-grained classification dataset for classifying different categories of flowers commonly occuring in the United Kingdom. Whereas, Image Net-A (Adversarial) [61], Image Net-S (Sketch) [62] and Image Net-R (Rendition) [63] are different versions of the original Image Net validation set.
Dataset Splits Yes In our setting, we divide the Image Net-A, Image Net-S and Image Net-R in to 80% train and 20% test set. For all other datasets we use the splits provided by [30].
Hardware Specification Yes For example, 3000 epochs of training the classifier on the data set of 130000 text sentences, representing the 1000 classes of the Image Net [53] dataset is completed in 120 seconds on an NVIDIA 3090 graphics card.
Software Dependencies No The paper mentions software components like 'Adam W as optimizer', 'GPT-3 [9]', 'Alpaca [64]', and 'CLIP pre-trained model from Open AI [1]' but does not provide specific version numbers for any of these or other software dependencies.
Experiment Setup Yes For training this classifier, we load the complete text dataset as a single batch and optimize the network using Adam W as optimizer, with a learning rate of 0.001. For unsupervised fine-tuning using visual data (Section 3.2), we again use the Adam W optimizer with a learning rate of 0.0001, batch size of 50 and optimize the learnable parameters for a total of 50 epochs.