The Star Geometry of Critic-Based Regularizer Learning

Authors: Oscar Leong, Eliza O'Reilly, Yong Sheng Soh

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We also experimentally show that such losses can be competitive for learning regularizers in a simple denoising setting. An empirical comparison between neural network-based regularizers learned using these losses and the adversarial loss is presented in Section 3.1.
Researcher Affiliation Academia Oscar Leong Department of Statistics and Data Science University of California, Los Angeles oleong@stat.ucla.edu Eliza O Reilly Department of Applied Mathematics and Statistics Johns Hopkins University eoreill2@jh.edu Yong Sheng Soh Department of Mathematics National University of Singapore matsys@nus.edu.sg
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No No statement or link indicating open-source code availability for the described methodology.
Open Datasets Yes To do this, we consider denoising on the MNIST dataset [50]. We take 10000 random samples from the MNIST training set (constituting our Dr distribution) and add Gaussian noise with variance σ2 = 0.05 (constituting our Dn distribution).
Dataset Splits No The paper does not explicitly mention a validation set or provide details on how data was split for validation.
Hardware Specification Yes The experiments were run on a single NVIDIA A100 GPU.
Software Dependencies No The paper mentions the use of the Adam optimizer, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes They were trained using the adversarial loss and Hellinger-based loss (5). We also used the gradient penalty term from [53] for both losses. We used the Adam optimizer for 20000 epochs and learning rate 10 3. We ran gradient descent for 2000 iterations with a learning rate of 10 3. For the choice of regularization parameter λ, we note that in [53], the authors fix this value to be λ := 2 λ where λ := EN(0,σ2I)[ z ℓ2] as the regularizer that achieves a small gradient penalty will be (approximately) 1-Lipschitz. For the Hellinger-based network, we found that λ = 5.1 λ2 gave better performance, so we used this for recovery. We additionally tune the regularization strength for the adversarially trained regularizer and found λ = 0.75 λ performed better than the original fixed value.