Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Easy Learning from Label Proportions
Authors: Róbert Busa-Fekete, Heejin Choi, Travis Dick, Claudio Gentile, Andres Munoz Medina
NeurIPS 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we empirically evaluate EASYLLP, PROPMATCH, and two baseline methods to characterize how their performance depends on the bag size for a range of different learning tasks and underlying learning models. ... Results. Figure 2 depicts the accuracy achieved by each method on a selection of datasets and models for a range of bag sizes. |
| Researcher Affiliation | Industry | Robert Busa-Fekete Google Research EMAIL; Heejin Choi Coupang Inc EMAIL; Travis Dick Google Research EMAIL; Claudio Gentile Google Research EMAIL; Andres Munoz Medina Google Research EMAIL |
| Pseudocode | Yes | We study a version of projected SGD that picks one example per bag and uses the soft-label corrected gradient estimates (pseudocode is given in in Algorithm 1 in the Appendix). |
| Open Source Code | No | The paper does not provide an explicit statement or link for the open-source code of their methodology. |
| Open Datasets | Yes | We carry out experiments on four (binary classification) datasets: Binarized versions of MNIST [13] and CIFAR-10 [12], as well as the Higgs [3] and UCI adult datasets [11]. |
| Dataset Splits | No | The paper mentions tuning learning rates and hyperparameters, which implies the use of a validation set, but does not explicitly provide specific details about training/test/validation dataset splits (e.g., percentages or sample counts) for reproducibility. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU or CPU models, memory specifications) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Adam [10] optimizer' and refers to frameworks like 'Tensorflow, JAX and Py Torch' but does not specify exact version numbers for any software dependencies. |
| Experiment Setup | Yes | To tune the learning rate for each method, we report the highest accuracy achieved for learning rates in {0.00001, 0.0001, 0.0005, 0.001, 0.005, 0.01, 0.05}. ... In all cases we use the Adam [10] optimizer, binary crossentropy loss, minibatches of size 512, and 20 training passes through the data. Finally, for the two image datasets, we decay the learning rate after 40%, 60%, 80%, and 90% of the training passes by factors 10, 100, 1000, and 5000, respectively. |