Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Text Descriptions are Compressive and Invariant Representations for Visual Learning

Authors: Zhili Feng, Anna Bair, J Zico Kolter

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the improved performance of AVD in three settings: zero-shot (no parameter update), linear probing (only the last layer is updated), and full model ﬁnetuning. Throughout the experiments, we focus on the few-shot setting. We test our method on Image Net, Image Net-R, Image Net-V2, Image Net A, Image Net-Sketch, and Object Net (Deng et al., 2009; Hendrycks et al., 2021a;b; Recht et al., 2019; Wang et al., 2019; Barbu et al., 2019), demonstrating the superiority of the sparsely learned visual descriptors ensemble. ... SLR-AVD outperforms baselines in both in-distribution (ID) and out-of-distribution (OOD) image classiﬁcation across a range of image datasets. Speciﬁcally, SLR-AVD on Image Net and its variations (including Image Net-R, Image Net V2, etc.) outperform linear probing with image features by 6.2% to 10.48% varying k-shot from k = 1 to k = 32.
Researcher Affiliation	Collaboration	Zhili Feng EMAIL Machine Learning Department Carnegie Mellon University Anna Bair EMAIL Machine Learning Department Carnegie Mellon University J. Zico Kolter EMAIL Computer Science Department Carnegie Mellon University Bosch Center for AI
Pseudocode	No	The paper includes a figure (Figure 1: An overview of our proposed method) that visually describes the method's flow, but it does not present structured pseudocode or an algorithm block with explicit steps.
Open Source Code	No	The paper does not contain an explicit statement about releasing their code or provide a direct link to a code repository for the methodology described. It mentions a CLIP github repository in Section 4, but that refers to a third-party resource used for templates, not their own implementation.
Open Datasets	Yes	We test our method on Image Net, Image Net-R, Image Net-V2, Image Net A, Image Net-Sketch, and Object Net (Deng et al., 2009; Hendrycks et al., 2021a;b; Recht et al., 2019; Wang et al., 2019; Barbu et al., 2019)... We conduct numerical evaluations on CIFAR-10 (Krizhevsky et al., 2009), CIFAR-10.1 (Recht et al., 2018), and CIFAR-10.2 (Lu et al., 2020)... We further conduct experiments on the WILDS benchmark (Koh et al., 2021), speciﬁcally, i Wild Cam (Beery et al., 2021) and FMo W (Christie et al., 2018).
Dataset Splits	Yes	The hyperparameters are swept over disjoint training and validation sets of size 20 per class for LP and SLR-AVD. For FT and SLR-FT-AVD, we select hyperparameters using a training and validation set of size 4 per class. ... We compare SLR-AVD to LP with {1, 2, 4, 8, 16, 32} shots per class.
Hardware Specification	No	The paper mentions a 'GPU implementation' in the appendix when discussing the SAGA method, but it does not specify any particular GPU model, CPU, or other hardware details used for the experiments.
Software Dependencies	No	The paper mentions several software components like 'GPT-3', 'GPT-4', 'Llama2-13B-chat', 'CLIP', 'SAGA (Defazio et al., 2014)', 'scikit-learn', and 'Adam W'. However, specific version numbers for these components are not provided.
Experiment Setup	Yes	The hyperparameters are swept over disjoint training and validation sets of size 20 per class for LP and SLR-AVD. For ℓ1 regularization... we apply the GPU implementation... of a variance-reduction proximal gradient method SAGA (Defazio et al., 2014). We adopt the regularization path approach, in which the solver optimizes over 100 regularization strengths λ1 > λ2 > λ100. Here we set λ1 to be the strength that returns a model that uses none of the features, and λ100 = 0.1 λ1. For LP... we use L-BFGS implemented by scikit-learn, and search for the regularization strength over 100 grids between 0.5 and 6. All the λs are evenly spread in the log-space1. For FT and SLR-FT-AVD, we select hyperparameters using a training and validation set of size 4 per class. The batch size is ﬁxed to be 512 and the number of epochs is ﬁxed to be 10. We always optimize with Adam W, and choose a cosine rate scheduler with warm-ups. We randomly select learning rate in [1e 8, 3e 5], weight decay in [0.1, 0.12], and warm up steps in {0, 50, 500}, for 20 trials.