reproducibilityindex.ai

OTTER: Effortless Label Distribution Adaptation of Zero-shot Models

Authors: Changho Shin, Jitian Zhao, Sonia Cromp, Harit Vishwakarma, Frederic Sala

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, we validate our method in a wide array of zero-shot image and text classification tasks, improving accuracy by 4.8% and 15.9% on average, and beating baselines like prior matching often by significant margins in 17 out of 21 datasets.
Researcher Affiliation	Academia	Department of Computer Sciences University of Wisconsin-Madison {cshin23, jzhao326, cromp, hvishwakarma, fsala}@wisc.edu
Pseudocode	Yes	Algorithm 1 OTTER 1: Input: Input X = {x1, . . . , xn}, label distribution specification (p1, . . . , p K), cost matrix C Rn K 2: Define input marginal µ = 1 n 1, prediction marginal ν = (p1, . . . , p K) 3: Run optimal transport and obtain transport plan π s.t. π = arg minγ Π(µ,ν) γ, C . 4: Get modified classification outputs ˆyi = arg maxj [K] πi,j. Return {ˆyi}i [n]
Open Source Code	Yes	Our code is available at https://github.com/SprocketLab/OTTER.
Open Datasets	Yes	We used 17 image classification datasets and 4 text classification datasets. ... CIFAR10, CIFAR100 [33], Caltech101 [22], Caltech256 [25], Food101 [8], STL10 [16], SUN397 [67], Flower102 [42], Euro SAT [27], Oxford IIIT Pet [44], Stanford Cars [32], DTD [14], CUB [61], Image Net [18], Image Net-r [29], and Image Net-Sketch [63]. Zeroshot text classification datasets We use Amazon [41], Gender [20], Civil Comments [7], and Hate Xplain [39].
Dataset Splits	Yes	We selected hyperparameters through grid search, by evaluating their performance on a validation set, consisting of 10 labeled examples per class.
Hardware Specification	Yes	Measurements were taken using a machine equipped with an Intel Core i7-11700K @ 3.60GHz processor, 64GB RAM, and NVIDIA GPU RTX-4090.
Software Dependencies	No	The paper mentions using CLIP [49] and BERT [19] models, but does not provide specific version numbers for these or any other software libraries, frameworks, or programming languages used in the experiments.
Experiment Setup	Yes	We selected hyperparameters through grid search, by evaluating their performance on a validation set, consisting of 10 labeled examples per class. ... Temperature: [1e-3, 1e-4, 1e-5, 1e-6, 1e-7] Learning rate: [1e-3, 1e-4, 1e-5, 1e-6, 1e-7] ... For zero-shot image classification, we emply CLIP [49] models. We used a photo of a [CLASS]' prompt. Scores are computed by sθ(xi, j) = exp (cos(f(xi), g(yj))/τ) PK j =1 exp (cos(f(xi), g(yj ))/τ) for image xi regarding the label j, given the image encoder f, the text encoder g. Cost matrix is constructed by C = [Cij]i [n],j [K], where cij = log sθ(xi, j). We run Algorithm 1 with the true class balance of the test dataset.