Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling
Authors: Mingze Wang, Zeping Min, Lei Wu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks. |
| Researcher Affiliation | Academia | 1School of Mathematical Sciences, Peking University, Beijing, China 2Center for Machine Learning Research, Peking University, Beijing, China. |
| Pseudocode | Yes | Algorithm 1 Progressive Rescaling Gradient Descent (PRGD) |
| Open Source Code | No | The paper does not provide a specific link or explicit statement about releasing the source code for the methodology described. |
| Open Datasets | Yes | Specifically, we employ the digit datasets from Sklearn, which are image classification tasks with d = 64, n = 300. ... VGG-16 network (Simonyan & Zisserman, 2015) on the full CIFAR-10 dataset (Krizhevsky & Hinton, 2009) |
| Dataset Splits | No | The paper mentions training and testing datasets, but it does not explicitly specify a validation dataset split or how validation was performed. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'VGG architecture implemented in PyTorch' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | The network was trained using a batch size of 64... We used a base learning rate of 1e-3, a momentum of 0.9, and a weight decay of 5e-4. ... We configured PRGD with Tk = 3000 * 2 + k * 3000 * 3, Rk = min(R0 * 2 + k * 3000 * 0.2, 1000). |