Achieving Margin Maximization Exponentially Fast via Progressive Norm Rescaling

Authors: Mingze Wang, Zeping Min, Lei Wu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To validate our theoretical findings, we present both synthetic and real-world experiments. Notably, PRGD also shows promise in enhancing the generalization performance when applied to linearly non-separable datasets and deep neural networks.
Researcher Affiliation Academia 1School of Mathematical Sciences, Peking University, Beijing, China 2Center for Machine Learning Research, Peking University, Beijing, China.
Pseudocode Yes Algorithm 1 Progressive Rescaling Gradient Descent (PRGD)
Open Source Code No The paper does not provide a specific link or explicit statement about releasing the source code for the methodology described.
Open Datasets Yes Specifically, we employ the digit datasets from Sklearn, which are image classification tasks with d = 64, n = 300. ... VGG-16 network (Simonyan & Zisserman, 2015) on the full CIFAR-10 dataset (Krizhevsky & Hinton, 2009)
Dataset Splits No The paper mentions training and testing datasets, but it does not explicitly specify a validation dataset split or how validation was performed.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'VGG architecture implemented in PyTorch' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup Yes The network was trained using a batch size of 64... We used a base learning rate of 1e-3, a momentum of 0.9, and a weight decay of 5e-4. ... We configured PRGD with Tk = 3000 * 2 + k * 3000 * 3, Rk = min(R0 * 2 + k * 3000 * 0.2, 1000).