The Benefits of Balance: From Information Projections to Variance Reduction
Authors: Lang Liu, Ronak Mehta, Soumik Pal, Zaid Harchaoui
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate how data balancing manifests in the motivating examples mentioned in Sec. 2 with experiments with CLIP-type models. We focus here on zero-shot image classification tasks. Details on these experiments, and additional ones including linear probing and zero-shot retrieval, as well as an empirical investigation of the sensitivity to misspecified marginals, are all contained in Appx. E. |
| Researcher Affiliation | Academia | University of Washington |
| Pseudocode | No | The paper describes algorithms and procedures in prose and mathematical notation but does not include explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code to reproduce the data and experiments can be found at https://github.com/ronakdm/balancing. |
| Open Datasets | Yes | For the training set, we use the Image Net Captions dataset [Fang et al., 2013], which pairs images from Image Net [Deng et al., 2009] that were taken from Flickr with their original captions. |
| Dataset Splits | No | The paper mentions training and test sets (E.1 Datasets), but does not explicitly describe validation splits or how they were used. |
| Hardware Specification | Yes | Experiments were run on a CPU/GPU workstation with 12 virtual cores, 126G of memory, and four NVIDIA TITAN Xp GPUs with 12G memory each. |
| Software Dependencies | No | The code was written in Python 3 and we use Py Torch for automatic differentiation. The Open CLIP and CLIP Benchmark repositories were used for zero-shot evaluation. Specific version numbers for Python, PyTorch, or the mentioned repositories are not provided. |
| Experiment Setup | Yes | For optimization, models were trained with stochastic gradient descent (SGD) with the learning rate tuned along the grid 1e-3, 3e-3, 1e-2, 3e-2, 1e-1 and a fixed weight decay parameter of 0.01. |