Nuclear Norm Regularization for Deep Learning
Authors: Christopher Scarvelis, Justin M. Solomon
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We complement our theoretical results with an empirical study of our regularizer s performance on synthetic data. As the Jacobian nuclear norm has seldom been used as a regularizer in deep learning, we propose applications of our method to unsupervised denoising, where one trains a denoiser given a dataset of noisy images without access to their clean counterparts, and to representation learning. Our work makes the Jacobian nuclear norm a feasible component of deep learning pipelines, enabling users to learn locally low-rank functions unencumbered by the heavy cost of naïve Jacobian nuclear norm regularization. |
| Researcher Affiliation | Academia | Christopher Scarvelis MIT CSAIL scarv@mit.edu Justin Solomon MIT CSAIL jsolomon@mit.edu |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks with explicit labels like 'Pseudocode' or 'Algorithm'. |
| Open Source Code | Yes | We have included full experimental details in Appendix B and also attached code for our experiments. We have uploaded a zip file with our submission that includes code for training our models and running our experiments. |
| Open Datasets | Yes | We train our denoiser by solving (11) with D(Ω) being the empirical distribution over 288k noisy images from the Imagenet training set [Russakovsky et al., 2015]. |
| Dataset Splits | No | The paper evaluates on the ImageNet validation set, but it does not specify how its *own* training data was split into training, validation, and test sets. It uses a pre-defined validation set for evaluation, not as a split from its primary training data. |
| Hardware Specification | Yes | Each training run for (8) takes approximately 2 hours, and each training run for (9) takes approximately 45 minutes on a single V100 GPU. Each denoising model takes approximately 5 hours to train on a single V100 GPU. Training this β-VAE takes approximately 30 minutes on a single V100 GPU. Training this autoencoder with the de... Training this autoencoders takes approximately 4 hours each on a single V100 GPU. |
| Software Dependencies | No | The paper mentions using the 'torch-dct package' but does not provide specific version numbers for this or any other key software dependencies (e.g., PyTorch, Python, CUDA). |
| Experiment Setup | Yes | We train all neural models using the Adam W optimizer [Loshchilov and Hutter, 2019] at a learning rate of 10 4 for 100,000 iterations with a batch size of 10,000. We employ a warmup strategy for solving our problem (9). We first train our neural nets at η = 0.05 in the n = 2 case and η = 0.01 in the n = 5 case for 10,000 iterations, and then increase η by 0.05 and 0.01, respectively, each 10,000 iterations until we reach the desired value of η. We then continue training until we reach 100,000 total iterations. |