Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

The Benefits of Balance: From Information Projections to Variance Reduction

Authors: Lang Liu, Ronak Mehta, Soumik Pal, Zaid Harchaoui

NeurIPS 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate how data balancing manifests in the motivating examples mentioned in Sec. 2 with experiments with CLIP-type models. We focus here on zero-shot image classification tasks. Details on these experiments, and additional ones including linear probing and zero-shot retrieval, as well as an empirical investigation of the sensitivity to misspecified marginals, are all contained in Appx. E.
Researcher Affiliation	Academia	University of Washington
Pseudocode	No	The paper describes algorithms and procedures in prose and mathematical notation but does not include explicit pseudocode blocks or algorithm listings.
Open Source Code	Yes	Code to reproduce the data and experiments can be found at https://github.com/ronakdm/balancing.
Open Datasets	Yes	For the training set, we use the Image Net Captions dataset [Fang et al., 2013], which pairs images from Image Net [Deng et al., 2009] that were taken from Flickr with their original captions.
Dataset Splits	No	The paper mentions training and test sets (E.1 Datasets), but does not explicitly describe validation splits or how they were used.
Hardware Specification	Yes	Experiments were run on a CPU/GPU workstation with 12 virtual cores, 126G of memory, and four NVIDIA TITAN Xp GPUs with 12G memory each.
Software Dependencies	No	The code was written in Python 3 and we use Py Torch for automatic differentiation. The Open CLIP and CLIP Benchmark repositories were used for zero-shot evaluation. Specific version numbers for Python, PyTorch, or the mentioned repositories are not provided.
Experiment Setup	Yes	For optimization, models were trained with stochastic gradient descent (SGD) with the learning rate tuned along the grid 1e-3, 3e-3, 1e-2, 3e-2, 1e-1 and a fixed weight decay parameter of 0.01.