Nuclear Norm Regularization for Deep Learning

Authors: Christopher Scarvelis, Justin M. Solomon

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We complement our theoretical results with an empirical study of our regularizer s performance on synthetic data. As the Jacobian nuclear norm has seldom been used as a regularizer in deep learning, we propose applications of our method to unsupervised denoising, where one trains a denoiser given a dataset of noisy images without access to their clean counterparts, and to representation learning. Our work makes the Jacobian nuclear norm a feasible component of deep learning pipelines, enabling users to learn locally low-rank functions unencumbered by the heavy cost of naïve Jacobian nuclear norm regularization.
Researcher Affiliation Academia Christopher Scarvelis MIT CSAIL scarv@mit.edu Justin Solomon MIT CSAIL jsolomon@mit.edu
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks with explicit labels like 'Pseudocode' or 'Algorithm'.
Open Source Code Yes We have included full experimental details in Appendix B and also attached code for our experiments. We have uploaded a zip file with our submission that includes code for training our models and running our experiments.
Open Datasets Yes We train our denoiser by solving (11) with D(Ω) being the empirical distribution over 288k noisy images from the Imagenet training set [Russakovsky et al., 2015].
Dataset Splits No The paper evaluates on the ImageNet validation set, but it does not specify how its *own* training data was split into training, validation, and test sets. It uses a pre-defined validation set for evaluation, not as a split from its primary training data.
Hardware Specification Yes Each training run for (8) takes approximately 2 hours, and each training run for (9) takes approximately 45 minutes on a single V100 GPU. Each denoising model takes approximately 5 hours to train on a single V100 GPU. Training this β-VAE takes approximately 30 minutes on a single V100 GPU. Training this autoencoder with the de... Training this autoencoders takes approximately 4 hours each on a single V100 GPU.
Software Dependencies No The paper mentions using the 'torch-dct package' but does not provide specific version numbers for this or any other key software dependencies (e.g., PyTorch, Python, CUDA).
Experiment Setup Yes We train all neural models using the Adam W optimizer [Loshchilov and Hutter, 2019] at a learning rate of 10 4 for 100,000 iterations with a batch size of 10,000. We employ a warmup strategy for solving our problem (9). We first train our neural nets at η = 0.05 in the n = 2 case and η = 0.01 in the n = 5 case for 10,000 iterations, and then increase η by 0.05 and 0.01, respectively, each 10,000 iterations until we reach the desired value of η. We then continue training until we reach 100,000 total iterations.