OTTER: Effortless Label Distribution Adaptation of Zero-shot Models

Authors: Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching often by significant margins in 17 out of 21 datasets.
Researcher Affiliation Academia Department of Computer Sciences University of Wisconsin-Madison {cshin23, jzhao326, cromp, hvishwakarma, fsala}@wisc.edu
Pseudocode Yes Algorithm 1 OTTER 1: Input: Input X = {x1, . . . , xn}, label distribution specification (p1, . . . , p K), cost matrix C Rn K 2: Define input marginal µ = 1 n 1, prediction marginal ν = (p1, . . . , p K) 3: Run optimal transport and obtain transport plan π s.t. π = arg minγ Π(µ,ν) γ, C . 4: Get modified classification outputs ˆyi = arg maxj [K] πi,j. Return {ˆyi}i [n]
Open Source Code Yes Our code is available at https://github.com/SprocketLab/OTTER.
Open Datasets Yes We used 17 image classification datasets and 4 text classification datasets. ... CIFAR10, CIFAR100 [33], Caltech101 [22], Caltech256 [25], Food101 [8], STL10 [16], SUN397 [67], Flower102 [42], Euro SAT [27], Oxford IIIT Pet [44], Stanford Cars [32], DTD [14], CUB [61], Image Net [18], Image Net-r [29], and Image Net-Sketch [63]. Zeroshot text classification datasets We use Amazon [41], Gender [20], Civil Comments [7], and Hate Xplain [39].
Dataset Splits Yes We selected hyperparameters through grid search, by evaluating their performance on a validation set, consisting of 10 labeled examples per class.
Hardware Specification Yes Measurements were taken using a machine equipped with an Intel Core i7-11700K @ 3.60GHz processor, 64GB RAM, and NVIDIA GPU RTX-4090.
Software Dependencies No The paper mentions using CLIP [49] and BERT [19] models, but does not provide specific version numbers for these or any other software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup Yes We selected hyperparameters through grid search, by evaluating their performance on a validation set, consisting of 10 labeled examples per class. ... Temperature: [1e-3, 1e-4, 1e-5, 1e-6, 1e-7] Learning rate: [1e-3, 1e-4, 1e-5, 1e-6, 1e-7] ... For zero-shot image classification, we emply CLIP [49] models. We used a photo of a [CLASS]' prompt. Scores are computed by sθ(xi, j) = exp (cos(f(xi), g(yj))/τ) PK j =1 exp (cos(f(xi), g(yj ))/τ) for image xi regarding the label j, given the image encoder f, the text encoder g. Cost matrix is constructed by C = [Cij]i [n],j [K], where cij = log sθ(xi, j). We run Algorithm 1 with the true class balance of the test dataset.