reproducibilityindex.ai

Monotone, Bi-Lipschitz, and Polyak-Łojasiewicz Networks

Authors: Ruigang Wang, Krishnamurthy Dj Dvijotham, Ian Manchester

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Here we present experiments which explore the expressive quality of the proposed models, regularisation via model distortion, and performance of the DYS solution method.
Researcher Affiliation	Collaboration	1Australian Centre for Robotics, School of Aerospace, Mechanical and Mechatronic Engineering, The University of Sydney, Sydney, NSW 2006, Australia. 2Google Deep Mind.
Pseudocode	Yes	For three-operator problems, the Davis-Yin splitting algorithm (DYS) (Davis & Yin, 2017) can be applied, obtaining the following fixed-point iteration: zk+1/2 = proxα f (uk) uk+1/2 = 2zk+1/2 uk zk+1 = RA(uk+1/2 αC(zk+1/2)) uk+1 = uk + zk+1 zk+1/2
Open Source Code	Yes	Code is available at https://github.com/acfr/PLNet.
Open Datasets	Yes	Toy example. Using the two-moon dataset, we compare our (µ, ν)-Lipschitz network to an SNGP... CIFAR-10/100. ... 2D Rosenbrock function... We take 5K random training samples from the domain [ 2, 2] [ 1, 3]. ... ND Rosenbrock function... We take 10K random samples over the domain [ 2, 2]20 and do training with batch size of 200.
Dataset Splits	No	The paper describes training and testing data splits, but does not explicitly mention or detail a separate validation set for all experiments. For example, for the ND Rosenbrock function, it states: 'We take 10K random samples over the domain [ 2, 2]20 and do training with batch size of 200. We then use 500K samples for testing.'
Hardware Specification	Yes	Training the original SNGP takes about 95% GPU memory of an Nvidia RTX3090.
Software Dependencies	No	The paper mentions using 'ADAM (Kingma & Ba, 2015)' and 'SGD' as optimizers, and 'Re LU' as an activation function, but it does not specify software versions for any libraries (e.g., PyTorch, TensorFlow, scikit-learn) or programming languages used.
Experiment Setup	Yes	We choose Re LU as our default activation and use ADAM (Kingma & Ba, 2015) with one-cycle linear learning rate (Coleman et al., 2017) except the NGP case which SGD with piecewise constant scheduling. For the NGP case, we use the cross entropy loss while the L2 loss is used for the rest of the examples. ... All models are trained for 200 epochs using the mini-batch stochastic gradient descent (SGD) method with batch size of 256. We adjust the learning rate based on a piecewise constant schedule.