LaFTer: Label-Free Tuning of Zero-shot Classifier using Language and Unlabeled Image Collections
Authors: Muhammad Jehanzeb Mirza, Leonid Karlinsky, Wei Lin, Horst Possegger, Mateusz Kozinski, Rogerio Feris, Horst Bischof
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our approach on 12 different datasets belonging to widely different domains. More specifically, we use four datasets containing common natural categories: Image Net [53], CIFAR-10/100 [55] and Caltech-101 [56]. |
| Researcher Affiliation | Collaboration | 1Institute of Computer Graphics and Vision, TU Graz, Austria. 2Christian Doppler Laboratory for Embedded Machine Learning. 3MIT-IBM Watson AI Lab, USA. |
| Pseudocode | No | The paper includes diagrams (e.g., Figure 1, Figure 2) to illustrate the proposed method but does not provide formal pseudocode blocks or algorithms. |
| Open Source Code | No | The paper provides a 'Project Page: https://jmiemirza.github.io/La FTer/'. While project pages often contain links to code, this is not an explicit statement of code release for the methodology, nor is it a direct link to a source-code repository. |
| Open Datasets | Yes | More specifically, we use four datasets containing common natural categories: Image Net [53], CIFAR-10/100 [55] and Caltech-101 [56]. Euro Sat [57] contains satellite images of 10 different locations. UCF-101 [58] is an action recognition dataset. SUN-397 [59] contains images from 397 naturally occuring scenes. Flowers-102 [60] is a fine-grained classification dataset for classifying different categories of flowers commonly occuring in the United Kingdom. Whereas, Image Net-A (Adversarial) [61], Image Net-S (Sketch) [62] and Image Net-R (Rendition) [63] are different versions of the original Image Net validation set. |
| Dataset Splits | Yes | In our setting, we divide the Image Net-A, Image Net-S and Image Net-R in to 80% train and 20% test set. For all other datasets we use the splits provided by [30]. |
| Hardware Specification | Yes | For example, 3000 epochs of training the classifier on the data set of 130000 text sentences, representing the 1000 classes of the Image Net [53] dataset is completed in 120 seconds on an NVIDIA 3090 graphics card. |
| Software Dependencies | No | The paper mentions software components like 'Adam W as optimizer', 'GPT-3 [9]', 'Alpaca [64]', and 'CLIP pre-trained model from Open AI [1]' but does not provide specific version numbers for any of these or other software dependencies. |
| Experiment Setup | Yes | For training this classifier, we load the complete text dataset as a single batch and optimize the network using Adam W as optimizer, with a learning rate of 0.001. For unsupervised fine-tuning using visual data (Section 3.2), we again use the Adam W optimizer with a learning rate of 0.0001, batch size of 50 and optimize the learnable parameters for a total of 50 epochs. |