Learning Neural Networks with Adaptive Regularization

Authors: Han Zhao, Yao-Hung Hubert Tsai, Russ R. Salakhutdinov, Geoffrey J. Gordon

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that the proposed method helps networks converge to local optima with smaller stable ranks and spectral norms. These properties suggest better generalizations and we present empirical results to support this expectation. We also verify the effectiveness of the approach on multiclass classification and multitask regression problems with various network structures.
Researcher Affiliation Collaboration Han Zhao , Yao-Hung Hubert Tsai , Ruslan Salakhutdinov , Geoffrey J. Gordon Carnegie Mellon University, Microsoft Research Montreal {han.zhao,yaohungt,rsalakhu}@cs.cmu.edu geoff.gordon@microsoft.com
Pseudocode Yes Algorithm 1 Block Coordinate Descent for Adaptive Regularization
Open Source Code Yes Our code is publicly available at: https://github.com/ yaohungt/Adaptive-Regularization-Neural-Network.
Open Datasets Yes Multiclass Classification (MNIST & CIFAR10): In this experiment, we show that Ada Reg provides an effective regularization on the network parameters. Multitask Regression (SARCOS): SARCOS relates to an inverse dynamics problem for a seven degree-of-freedom (DOF) SARCOS anthropomorphic robot arm [41].
Dataset Splits No The paper explicitly mentions 'training set' and 'test set' sizes and usage for SARCOS, MNIST, and CIFAR10, but it does not provide explicit details about a separate validation set split or its methodology.
Hardware Specification No The acknowledgments section mentions 'Nvidia GPU grant' and 'NVIDIA’s GPU support,' but it does not specify the exact models or configurations of the hardware (e.g., specific GPU series, CPU type, memory) used for the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python version, specific deep learning framework versions like PyTorch or TensorFlow).
Experiment Setup Yes We also note that we fix all the hyperparameters such as learning rate to be the same for all the methods. We study two minibatch settings for 256 and 2048, respectively. In this experiment, we fix the number of outer loop to be 2/5 and each block optimization over network weights contains 50 epochs.