Global Convergence of Block Coordinate Descent in Deep Learning

Authors: Jinshan Zeng, Tim Tsz-Kit Lau, Shaobo Lin, Yuan Yao

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As shown in Figure 1, it is observed that vanilla SGD fails to train a ten-hidden-layer MLPs while BCD still works and achieves a moderate accuracy within a few epochs. Refer to Appendix F for details of this experiment2.
Researcher Affiliation Academia 1School of Computer and Information Engineering, Jiangxi Normal University, Nanchang 330022, Jiangxi, China 2Department of Mathematics, The Hong Kong University of Science and Technology, Clear Water Bay, Kowloon, Hong Kong 3Department of Statistics, Northwestern University, Evanston, IL 60208, USA 4Department of Mathematics, City University of Hong Kong, Kowloon, Hong Kong.
Pseudocode Yes Algorithm 1 Two-splitting BCD for DNN Training (2.3) and Algorithm 2 Three-splitting BCD for DNN training (2.5)
Open Source Code Yes Codes available at: https://github.com/timlautk/ BCD-for-DNNs-Py Torch.
Open Datasets Yes Figure 1. Comparison of training and test accuracies of BCD and SGD for training ten-hidden-layer MLPs on the MNIST dataset.
Dataset Splits No Figure 1. Comparison of training and test accuracies of BCD and SGD for training ten-hidden-layer MLPs on the MNIST dataset. Refer to Appendix F for details of this experiment. The paper mentions the MNIST dataset but does not explicitly specify the training, validation, and test splits (e.g., percentages, sample counts, or a citation to a standard split) in the main text.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU or CPU models.
Software Dependencies No Codes available at: https://github.com/timlautk/ BCD-for-DNNs-Py Torch. This indicates PyTorch is used, but no specific version numbers are provided for PyTorch or other software dependencies.
Experiment Setup No Refer to Appendix F for details of this experiment2. The main text does not contain specific hyperparameter values, training configurations, or system-level settings.