pbSGD: Powered Stochastic Gradient Descent Methods for Accelerated Non-Convex Optimization

Authors: Beitong Zhou, Jun Liu, Weigao Sun, Ruijuan Chen, Claire Tomlin, Ye Yuan

IJCAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The purpose of this section is to demonstrate the efficiency and effectiveness of the proposed pb SGD and pb SGDM algorithms. We conduct experiments of different model architectures on datasets in comparison with widely used optimization methods including the non-adaptive method SGDM and three popular adaptive methods: Ada Grad, RMSprop and Adam.
Researcher Affiliation Academia Beitong Zhou1 , Jun Liu2 , Weigao Sun1 , Ruijuan Chen1 , Claire Tomlin3 and Ye Yuan1 1School of Artificial Intelligence and Automation, Huazhong University of Science and Technology 2Department of Applied Mathematics, University of Waterloo 3Department of Electrical Engineering and Computer Sciences, UC Berkeley
Pseudocode Yes Pseudo-code of the proposed pb SGDM is detailed in Algorithm 2.
Open Source Code No The paper does not provide any explicit statements about releasing source code for the proposed methodology (pb SGD/pb SGDM) nor does it include a link to a code repository for their implementation. Footnotes link to third-party model implementations.
Open Datasets Yes We conduct experiments of different model architectures on datasets in comparison with widely used optimization methods... (CIFAR-10, CIFAR-100, Image Net, MNIST).
Dataset Splits No The paper specifies using a 'mini-batch size of 128 (except 256 in the Image Net experiment)' and discusses 'train' and 'test' accuracy. While hyperparameters are tuned, the paper does not explicitly detail a separate 'validation' dataset split or how it was used in the training process beyond what's implied by hyperparameter tuning.
Hardware Specification No The paper discusses training deep neural networks and running experiments but does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for these experiments.
Software Dependencies No The paper mentions 'pytorch-cifar' in footnotes for model architectures (e.g., https://github.com/kuangliu/pytorch-cifar), implying PyTorch might be used, but it does not specify any software dependencies with version numbers (e.g., 'PyTorch 1.x' or 'CUDA 11.x').
Experiment Setup Yes The setup for each experiment is detailed in Table 11. In the first part, we present empirical study of different deep neural network architectures to see how the proposed methods behave in terms of convergence speed and generalization. ... For all experiments, we used a mini-batch size of 128 (except 256 in the Image Net experiment).