Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
DeepZero: Scaling Up Zeroth-Order Optimization for Deep Model Training
Authors: Aochuan Chen, Yimeng Zhang, Jinghan Jia, James Diffenderfer, Konstantinos Parasyris, Jiancheng Liu, Yihua Zhang, Zheng Zhang, Bhavya Kailkhura, Sijia Liu
ICLR 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our extensive experiments show that Deep Zero achieves state-of-the-art (SOTA) accuracy on Res Net-20 trained on CIFAR-10, approaching FO training performance for the first time. |
| Researcher Affiliation | Collaboration | Michigan State University, Lawrence Livermore National Laboratory, UC Santa Barbara Equal contributions |
| Pseudocode | Yes | Algorithm 1 ZO-Gra SP-oriented-LPR-guided ZO training |
| Open Source Code | Yes | Codes are available at https://github.com/OPTML-Group/Deep Zero. |
| Open Datasets | Yes | training a Res Net20 on CIFAR-10 |
| Dataset Splits | No | No. The paper mentions training on CIFAR-10 for image classification and evaluating testing accuracy, but it does not explicitly provide the specific training, validation, and test dataset splits or a reference to how these splits were defined for reproduction. |
| Hardware Specification | Yes | Experiments are run on 4 NVIDIA V100 GPUs if not specified otherwise. |
| Software Dependencies | No | No. The paper mentions various optimizers (SGD, Adam) and a simulation code (Phi Flow) but does not provide specific version numbers for these software components or any other libraries like PyTorch or TensorFlow, which are essential for reproducibility. |
| Experiment Setup | Yes | We adopt SGD (stochastic gradient descent) as the FO training recipe, with a weight decay of 5 * 10^-4 and a momentum of 0.9. The learning rate is 0.1, governed by a cosine decay scheduler. In the ZO training scenario, we replace the FO gradient by (Sparse-CGE) with a smoothing parameter ยต = 5 * 10^-3. When implementing ZO-Gra SP (3), we set the query budget q = 192 and use the same ยต as CGE. Unless specified otherwise, the weight sparsity ratio is chosen to be 90% and the specific sparsity patterns are determined by SR (Smart Ratio). When implementing Deep Zero (Algorithm 2), we choose the number of epochs T = 50. |