Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PseuZO: Pseudo-Zeroth-Order Algorithm for Training Deep Neural Networks

Authors: Pengyun Yue, Xuanlin Yang, Mingqing Xiao, Zhouchen Lin

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate that Pseu ZO outperforms Me ZO and Me ZO-SVRG in classification, multiple choice and generation tasks in both full-parameter and PEFT fine-tuning settings by boosting convergence in the early stages of training. For instance, under the same computation time, with respect to SST2 task, Pesu ZO gets 9.8% higher accuracy than Me ZO (91.2% v.s. 82.4%).
Researcher Affiliation Collaboration 1State Key Lab of General AI, School of Intelligence Science and Technology, Peking University 2Institute for Artificial Intelligence, Peking University 3 Pazhou Laboratory (Huangpu), Guangzhou, Guangdong, China 4 Microsoft Research Asia 5 Zhongguancun Academy
Pseudocode Yes Algorithm 1 Matrix-based Pseu ZO Algorithm Algorithm 2 Sliding Window-based Pseu ZO Algorithm
Open Source Code Yes The code is available at https://github.com/Yang Big Mn/Pseu ZO.
Open Datasets Yes We conduct comprehensive experiments in various tasks on large auto-regressive language models like opt-1.3B [58] and the same prompt design as Me ZO is utilized which is effective and fair for comparison for various datasets including GLUE [52] and Super GLUE [51] benchmarks. Table 7: Training from scratch on typical computer vision classification datasets for various feedback methods. We do not use local loss for MNIST as there are only two hidden layers.
Dataset Splits Yes We choose K = 16 as the batch size and randomly select 1024 samples for training and 512 samples for evaluation. All experiments are run on a single Nvidia A800 40Gi B GPU. When training, for WSC, CB and COPA, they have much less total samples and thus we set aside 100 evaluation samples and use the rest for training.
Hardware Specification Yes All experiments are run on a single Nvidia A800 40Gi B GPU.
Software Dependencies No The paper does not explicitly mention specific software dependencies with version numbers in the provided text.
Experiment Setup Yes Setup. We implement Pseu ZO, Me ZO-SVRG and Hi ZOO-L in the Me ZO framework with appropriate adjustment for fair comparison. We conduct comprehensive experiments in various tasks on large auto-regressive language models like opt-1.3B [58] and the same prompt design as Me ZO is utilized which is effective and fair for comparison for various datasets including GLUE [52] and Super GLUE [51] benchmarks. We run all experiments for 10K steps and evaluate performance of the model every 2K steps for Hi ZOO-L and Me ZO-SVRG. In order to ensure that Me ZO and Pseu ZO are sufficiently convergent, we run Pseu ZO and Me ZO for 10K and 20K steps, respectively. We choose K = 16 as the batch size and randomly select 1024 samples for training and 512 samples for evaluation. All experiments are run on a single Nvidia A800 40Gi B GPU.