Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Noise Matters: Optimizing Matching Noise for Diffusion Classifiers

Authors: Yanghao Wang, Long Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive ablations on various datasets demonstrated the effectiveness of No Op. It is worth noting that our noise optimization is orthogonal to existing optimization methods (e.g., prompt tuning), our No OP can even benefit from these methods to further boost performance. Code is available at https://github.com/HKUST-Long Group/No Op. We evaluated the effectiveness of our method over eight few-shot classification datasets. Extensive ablation results showed the stability of No Op.
Researcher Affiliation	Academia	Yanghao Wang, Long Chen The Hong Kong University of Science and Technology EMAIL, EMAIL
Pseudocode	No	The paper describes methods and pipelines using text and diagrams (e.g., Figure 3: Pipeline of No Op), but does not include any explicitly labeled pseudocode blocks or algorithms.
Open Source Code	Yes	Code is available at https://github.com/HKUST-Long Group/No Op.
Open Datasets	Yes	We evaluated three diffusion models, i.e., Stable Diffusion-v1.4, Stable Diffusion-v1.5 [34], Stable Diffusion-v2.0 [35] across eight datasets: CIFAR-10 [36], CIFAR-100 [36], Flowers102 [37], DTD [15], Oxford Pets [38], Euro SAT [39], STL-10 [14] and FGVCAircraft [40].
Dataset Splits	Yes	We followed the few-shot evaluation protocol of CLIP [1], using 1, 2, 4, 8, and 16 shots for training, respectively, and deploying models in the full test sets. For the K-way-N-shot image classification task, typically there is a training set D with K categories {c1, ..., c K}. For each category, there are N labeled training samples. The few-shot learning aims at improving the model based on the training set to perform better classification on the full test set.
Hardware Specification	Yes	All experiments are conducted on 32 NVIDIA V100 GPUs.
Software Dependencies	No	The paper mentions software components like 'Discrete Euler' as a timestep scheduler, 'U-Net' architecture, 'Re Lu activation', and 'Batch Norm', and the 'Adam optimizer', but it does not specify version numbers for any libraries or frameworks (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	We used the Adam optimizer [41] with a 1e 2 and 1e 3 learning rates for the learnable noise the Meta-Network respectively. After training 20 epochs, we reported the top-1 accuracies. Results are averaged on three random seeds. For fair comparisons, we fixed the timestep t = 500. The training batch size is 32.