Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
GraB: Finding Provably Better Data Permutations than Random Reshuffling
Authors: Yucheng Lu, Wentao Guo, Christopher M. De Sa
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically1 on applications including MNIST, CIFAR10, Wiki Text and GLUE that Gra B can outperform random reshuffling in terms of both training and validation performance, and even outperform state-of-the-art greedy ordering while reducing memory usage over 100 . |
| Researcher Affiliation | Academia | Yucheng Lu, Wentao Guo, Christopher De Sa Department of Computer Science Cornell University EMAIL |
| Pseudocode | Yes | Algorithm 1 Herding with Greedy Ordering |
| Open Source Code | Yes | The experimental code is available at https://github.com/Eugene LYC/Gra B. |
| Open Datasets | Yes | We show empirically1 on applications including MNIST, CIFAR10, Wiki Text and GLUE that Gra B can outperform random reshuffling in terms of both training and validation performance, and even outperform state-of-the-art greedy ordering while reducing memory usage over 100 . |
| Dataset Splits | No | The paper mentions using training and validation data, but does not specify explicit dataset split percentages, sample counts, or refer to predefined splits in the main text. |
| Hardware Specification | Yes | All the experiments run on an instance configured with a 4-core Intel(R) Xeon(R) 2.50GHz CPU, 32GB memory and an NVIDIA Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' as an example of an ML library, but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | No | Detailed information regarding models, datasets and hyperparameters can be found in Appendix A. |