Adaptive Proximal Gradient Methods for Structured Neural Networks

Authors: Jihun Yun, Aurelie C. Lozano, Eunho Yang

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We consider two important tasks for regularized training in deep learning communities: (i) training sparse neural networks and (ii) network quantization. Throughout our experiments, we consider ADAM as a representative of PROXGEN where mt = ρtmt 1 + (1 ρt)gt with constant decaying parameter ρt = 0.9 and Ct = p betaCt 1 + (1 beta)g2 t with beta = 0.999 in Algorithm 1. The details on other hyperparameter/experiment settings are provided in the Appendix. Figures 2 illustrates the results for Res Net-34.
Researcher Affiliation Collaboration Jihun Yun KAIST arcprime@kaist.ac.kr Aurelie C. Lozano IBM T.J. Watson Research Center aclozano@us.ibm.com Eunho Yang KAIST, AITRICS eunhoy@kaist.ac.kr
Pseudocode Yes Algorithm 1 PROXGEN: A General Stochastic Proximal Gradient Method
Open Source Code No The paper does not provide any explicit statements about the release of source code for the described methodology, nor does it include a direct link to a code repository.
Open Datasets Yes We train Res Net-34 on CIFAR-10 dataset. We consider training VGG-16 [51] and Res Net-34 [52] on CIFAR-10 dataset using sparsity encouraging regularizers. For comparisons, we quantize Res Net weight parameters (except bias and activations) on CIFAR-10 and Image Net dataset.
Dataset Splits No The paper mentions using standard datasets like CIFAR-10 and ImageNet, but it does not explicitly provide specific details on the training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for reproduction.
Hardware Specification No The paper discusses algorithmic details and experimental results, but it does not specify any hardware used for training or experimentation, such as specific GPU models, CPU types, or cloud computing resources.
Software Dependencies No The paper mentions popular machine learning libraries like TensorFlow [23] and PyTorch [24] as background context. However, it does not provide specific version numbers for these or any other software dependencies required to reproduce the experiments.
Experiment Setup Yes Throughout our experiments, we consider ADAM as a representative of PROXGEN where mt = ρtmt 1 + (1 ρt)gt with constant decaying parameter ρt = 0.9 and Ct = p betaCt 1 + (1 beta)g2 t with beta = 0.999 in Algorithm 1. The details on other hyperparameter/experiment settings are provided in the Appendix.