Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
The Benefits of Balance: From Information Projections to Variance Reduction
Authors: Lang Liu, Ronak Mehta, Soumik Pal, Zaid Harchaoui
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We illustrate how data balancing manifests in the motivating examples mentioned in Sec. 2 with experiments with CLIP-type models. We focus here on zero-shot image classification tasks. Details on these experiments, and additional ones including linear probing and zero-shot retrieval, as well as an empirical investigation of the sensitivity to misspecified marginals, are all contained in Appx. E. |
| Researcher Affiliation | Academia | University of Washington |
| Pseudocode | No | The paper describes algorithms and procedures in prose and mathematical notation but does not include explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Code to reproduce the data and experiments can be found at https://github.com/ronakdm/balancing. |
| Open Datasets | Yes | For the training set, we use the Image Net Captions dataset [Fang et al., 2013], which pairs images from Image Net [Deng et al., 2009] that were taken from Flickr with their original captions. |
| Dataset Splits | No | The paper mentions training and test sets (E.1 Datasets), but does not explicitly describe validation splits or how they were used. |
| Hardware Specification | Yes | Experiments were run on a CPU/GPU workstation with 12 virtual cores, 126G of memory, and four NVIDIA TITAN Xp GPUs with 12G memory each. |
| Software Dependencies | No | The code was written in Python 3 and we use Py Torch for automatic differentiation. The Open CLIP and CLIP Benchmark repositories were used for zero-shot evaluation. Specific version numbers for Python, PyTorch, or the mentioned repositories are not provided. |
| Experiment Setup | Yes | For optimization, models were trained with stochastic gradient descent (SGD) with the learning rate tuned along the grid 1e-3, 3e-3, 1e-2, 3e-2, 1e-1 and a fixed weight decay parameter of 0.01. |