Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

MixedNUTS: Training-Free Accuracy-Robustness Balance via Nonlinearly Mixed Classifiers

Authors: Yatong Bai, Mo Zhou, Vishal M. Patel, Somayeh Sojoudi

TMLR 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	On CIFAR-10, CIFAR-100, and Image Net datasets, experimental results with custom strong adaptive attacks demonstrate Mixed NUTS s vastly improved accuracy and near-SOTA robustness it boosts CIFAR-100 clean accuracy by 7.86 points, sacrificing merely 0.87 points in robust accuracy. 5 Experiments
Researcher Affiliation	Academia	Yatong Bai EMAIL University of California, Berkeley Mo Zhou EMAIL John Hopkins University Vishal M. Patel EMAIL John Hopkins University Somayeh Sojoudi EMAIL University of California, Berkeley
Pseudocode	Yes	Algorithm 1 Algorithm for optimizing s, p, c, and α. 1: Given an image set, save the predicted logits associated with mispredicted clean images h LN rob(x) : x e X clean . 2: Run MMAA on h LN rob( ) and save the logits of correctly classified perturbed inputs h LN rob(x) : x e A adv .
Open Source Code	Yes	Please refer to our source code for details on implementation.
Open Datasets	Yes	Our evaluation uses CIFAR-10 (Krizhevsky, 2009), CIFAR-100 (Krizhevsky, 2009), and Image Net (Deng et al., 2009) datasets.
Dataset Splits	Yes	Uses 5000 validation images as speciﬁed in Robust Bench. ... All mixed classifiers are evaluated with strengthened adaptive Auto Attack algorithms specialized in attacking Mixed NUTS and do not manifest gradient obfuscation issues, with the details explained in Appendix B.2. ... As in Figure 6, 10000 clean examples and 1000 Auto Attack examples are used.
Hardware Specification	Yes	In practice, the triply-nested grid search loop can be completed within ten seconds on a laptop computer, and performing MMAA on 1000 images requires 3752/10172 seconds for CIFAR-100/Image Net with a single Nvidia RTX-8000 GPU.
Software Dependencies	No	The paper mentions 'Py Torch realization is Tensor.detach()' and refers to 'Auto Attack' and 'Robust Bench' components, but does not provide specific version numbers for these software dependencies or any other libraries.
Experiment Setup	Yes	Our experiments use β = 98.5% for CIFAR-10 and -100, and use β = 99.0% for Image Net. The optimal s, p, c values and the searching grid used in Algorithm 1 are discussed in Appendix C.1. ...In our experiments, we generate uniform linear intervals as the candidate values for the power coefficient p and the bias coefficient c, and use a log-scale interval for the scale coefficient s. Each interval has eight numbers, with the minimum and maximum values for the intervals listed in Table 3.