The Benefits of Balance: From Information Projections to Variance Reduction

Authors: Lang Liu, Ronak Mehta, Soumik Pal, Zaid Harchaoui

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We illustrate how data balancing manifests in the motivating examples mentioned in Sec. 2 with experiments with CLIP-type models. We focus here on zero-shot image classification tasks. Details on these experiments, and additional ones including linear probing and zero-shot retrieval, as well as an empirical investigation of the sensitivity to misspecified marginals, are all contained in Appx. E.
Researcher Affiliation Academia University of Washington
Pseudocode No The paper describes algorithms and procedures in prose and mathematical notation but does not include explicit pseudocode blocks or algorithm listings.
Open Source Code Yes Code to reproduce the data and experiments can be found at https://github.com/ronakdm/balancing.
Open Datasets Yes For the training set, we use the Image Net Captions dataset [Fang et al., 2013], which pairs images from Image Net [Deng et al., 2009] that were taken from Flickr with their original captions.
Dataset Splits No The paper mentions training and test sets (E.1 Datasets), but does not explicitly describe validation splits or how they were used.
Hardware Specification Yes Experiments were run on a CPU/GPU workstation with 12 virtual cores, 126G of memory, and four NVIDIA TITAN Xp GPUs with 12G memory each.
Software Dependencies No The code was written in Python 3 and we use Py Torch for automatic differentiation. The Open CLIP and CLIP Benchmark repositories were used for zero-shot evaluation. Specific version numbers for Python, PyTorch, or the mentioned repositories are not provided.
Experiment Setup Yes For optimization, models were trained with stochastic gradient descent (SGD) with the learning rate tuned along the grid 1e-3, 3e-3, 1e-2, 3e-2, 1e-1 and a fixed weight decay parameter of 0.01.