Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks

Authors: Ruigang Wang, Krishnamurthy Dj Dvijotham, Ian Manchester

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Here we present experiments which explore the expressive quality of the proposed models, regularisation via model distortion, and performance of the DYS solution method.
Researcher Affiliation Collaboration 1Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, Australia. 2Google Deep Mind.
Pseudocode Yes For three-operator problems, the Davis-Yin splitting algorithm (DYS) (Davis & Yin, 2017) can be applied, obtaining the following fixed-point iteration: zk+1/2 = proxα f (uk) uk+1/2 = 2zk+1/2 uk zk+1 = RA(uk+1/2 αC(zk+1/2)) uk+1 = uk + zk+1 zk+1/2
Open Source Code Yes Code is available at https://github.com/acfr/PLNet.
Open Datasets Yes Toy example. Using the two-moon dataset, we compare our (µ, ν)-Lipschitz network to an SNGP... CIFAR-10/100. ... 2D Rosenbrock function... We take 5K random training samples from the domain [ 2, 2] [ 1, 3]. ... ND Rosenbrock function... We take 10K random samples over the domain [ 2, 2]20 and do training with batch size of 200.
Dataset Splits No The paper describes training and testing data splits, but does not explicitly mention or detail a separate validation set for all experiments. For example, for the ND Rosenbrock function, it states: 'We take 10K random samples over the domain [ 2, 2]20 and do training with batch size of 200. We then use 500K samples for testing.'
Hardware Specification Yes Training the original SNGP takes about 95% GPU memory of an Nvidia RTX3090.
Software Dependencies No The paper mentions using 'ADAM (Kingma & Ba, 2015)' and 'SGD' as optimizers, and 'Re LU' as an activation function, but it does not specify software versions for any libraries (e.g., PyTorch, TensorFlow, scikit-learn) or programming languages used.
Experiment Setup Yes We choose Re LU as our default activation and use ADAM (Kingma & Ba, 2015) with one-cycle linear learning rate (Coleman et al., 2017) except the NGP case which SGD with piecewise constant scheduling. For the NGP case, we use the cross entropy loss while the L2 loss is used for the rest of the examples. ... All models are trained for 200 epochs using the mini-batch stochastic gradient descent (SGD) method with batch size of 256. We adjust the learning rate based on a piecewise constant schedule.