Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Text Descriptions are Compressive and Invariant Representations for Visual Learning
Authors: Zhili Feng, Anna Bair, J Zico Kolter
TMLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the improved performance of AVD in three settings: zero-shot (no parameter update), linear probing (only the last layer is updated), and full model finetuning. Throughout the experiments, we focus on the few-shot setting. We test our method on Image Net, Image Net-R, Image Net-V2, Image Net A, Image Net-Sketch, and Object Net (Deng et al., 2009; Hendrycks et al., 2021a;b; Recht et al., 2019; Wang et al., 2019; Barbu et al., 2019), demonstrating the superiority of the sparsely learned visual descriptors ensemble. ... SLR-AVD outperforms baselines in both in-distribution (ID) and out-of-distribution (OOD) image classification across a range of image datasets. Specifically, SLR-AVD on Image Net and its variations (including Image Net-R, Image Net V2, etc.) outperform linear probing with image features by 6.2% to 10.48% varying k-shot from k = 1 to k = 32. |
| Researcher Affiliation | Collaboration | Zhili Feng EMAIL Machine Learning Department Carnegie Mellon University Anna Bair EMAIL Machine Learning Department Carnegie Mellon University J. Zico Kolter EMAIL Computer Science Department Carnegie Mellon University Bosch Center for AI |
| Pseudocode | No | The paper includes a figure (Figure 1: An overview of our proposed method) that visually describes the method's flow, but it does not present structured pseudocode or an algorithm block with explicit steps. |
| Open Source Code | No | The paper does not contain an explicit statement about releasing their code or provide a direct link to a code repository for the methodology described. It mentions a CLIP github repository in Section 4, but that refers to a third-party resource used for templates, not their own implementation. |
| Open Datasets | Yes | We test our method on Image Net, Image Net-R, Image Net-V2, Image Net A, Image Net-Sketch, and Object Net (Deng et al., 2009; Hendrycks et al., 2021a;b; Recht et al., 2019; Wang et al., 2019; Barbu et al., 2019)... We conduct numerical evaluations on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-10.1 (Recht et al., 2018), and CIFAR-10.2 (Lu et al., 2020)... We further conduct experiments on the WILDS benchmark (Koh et al., 2021), specifically, i Wild Cam (Beery et al., 2021) and FMo W (Christie et al., 2018). |
| Dataset Splits | Yes | The hyperparameters are swept over disjoint training and validation sets of size 20 per class for LP and SLR-AVD. For FT and SLR-FT-AVD, we select hyperparameters using a training and validation set of size 4 per class. ... We compare SLR-AVD to LP with {1, 2, 4, 8, 16, 32} shots per class. |
| Hardware Specification | No | The paper mentions a 'GPU implementation' in the appendix when discussing the SAGA method, but it does not specify any particular GPU model, CPU, or other hardware details used for the experiments. |
| Software Dependencies | No | The paper mentions several software components like 'GPT-3', 'GPT-4', 'Llama2-13B-chat', 'CLIP', 'SAGA (Defazio et al., 2014)', 'scikit-learn', and 'Adam W'. However, specific version numbers for these components are not provided. |
| Experiment Setup | Yes | The hyperparameters are swept over disjoint training and validation sets of size 20 per class for LP and SLR-AVD. For ℓ1 regularization... we apply the GPU implementation... of a variance-reduction proximal gradient method SAGA (Defazio et al., 2014). We adopt the regularization path approach, in which the solver optimizes over 100 regularization strengths λ1 > λ2 > λ100. Here we set λ1 to be the strength that returns a model that uses none of the features, and λ100 = 0.1 λ1. For LP... we use L-BFGS implemented by scikit-learn, and search for the regularization strength over 100 grids between 0.5 and 6. All the λs are evenly spread in the log-space1. For FT and SLR-FT-AVD, we select hyperparameters using a training and validation set of size 4 per class. The batch size is fixed to be 512 and the number of epochs is fixed to be 10. We always optimize with Adam W, and choose a cosine rate scheduler with warm-ups. We randomly select learning rate in [1e 8, 3e 5], weight decay in [0.1, 0.12], and warm up steps in {0, 50, 500}, for 20 trials. |