Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Disentangling Sampling and Labeling Bias for Learning in Large-output Spaces
Authors: Ankit Singh Rawat, Aditya K Menon, Wittawat Jitkrittum, Sadeep Jayasumana, Felix Yu, Sashank Reddi, Sanjiv Kumar
ICML 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically verify our findings on long-tail classification and retrieval benchmarks. and 5. Experiments We now present experiments on benchmarks for both long-tail learning and retrieval, illustrating our main finding: existing negative sampling schemes, such as within-batch sampling with constant weighting, implicitly trade-off performance on dominant versus rare labels. |
| Researcher Affiliation | Industry | 1Google Research, New York, USA. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement or link to the source code for the described methodology. |
| Open Datasets | Yes | We present results on long-tailed ( LT ) versions of the CIFAR-100 and Image Net datasets. and In particular, we experiment with AMAZONCAT-13K and WIKILSHTC-325K datasets from the extreme classification literature (Agrawal et al., 2013; Bengio et al., 2019), where due to a large number of labels it is common to employ negative sampling. In addition, we also explored a small scale dataset DELICIOUS from the repository to make our conclusions more general. |
| Dataset Splits | No | The paper mentions training on datasets and evaluating on a test set, but does not explicitly provide the training/validation/test split percentages or counts for all datasets in the main text. It states 'We report the test set balanced error,' which indicates a test set, but a clear validation split is not specified. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions training models (e.g., ResNet) but does not provide specific version numbers for software dependencies or libraries used. |
| Experiment Setup | Yes | We use m = 32 negatives on CIFAR-100, and m = 512 negatives on Image Net. and We train a Res Net-56 for CIFAR and a Res Net-50 for Image Net, using SGD with momentum; see Appendix E for details. |