Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Evaluating State-of-the-Art Classification Models Against Bayes Optimality
Authors: Ryan Theisen, Huan Wang, Lav R. Varshney, Caiming Xiong, Richard Socher
NeurIPS 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We use our approach to conduct a thorough investigation of state-of-the-art classification models, and find that in some but not all cases, these models are capable of obtaining accuracy very near optimal. |
| Researcher Affiliation | Collaboration | Ryan Theisen University of California, Berkeley EMAIL Huan Wang Salesforce Research EMAIL Lav R. Varshney University of Illinois Urbana-Champaign EMAIL Caiming Xiong Salesforce Research EMAIL Richard Socher you.com EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code can be found at https://github.com/salesforce/DataHardness. |
| Open Datasets | Yes | We train flow models4 on a wide variety of standard benchmark datasets: MNIST [19], Extended MNIST (EMNIST) [5], Fashion MNIST [36], CIFAR-10 [17], CIFAR-100 [17], SVHN [23], and Kuzushiji-MNIST [4]. |
| Dataset Splits | No | The paper mentions using 60,000 training samples and 10,000 testing samples but does not specify a validation set or its size. |
| Hardware Specification | Yes | The training and evaluation are done on a workstation with 2 NVIDIA V100 GPUs. |
| Software Dependencies | No | The paper mentions using a "pytorch implementation [13] of Glow [16]" but does not specify version numbers for PyTorch or other software dependencies. |
| Experiment Setup | Yes | In all our the experiments, affine coupling layers are used, the number of steps of the flow in each level K = 16, the number of levels L = 3, and number of channels in hidden layers C = 512. |