Soft ascent-descent as a stable and flexible alternative to flooding
Authors: Matthew Holland, Kosuke Nakatani
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through rigorous empirical tests using both simulated and real-world benchmark classification datasets, featuring neural networks both large and small, we discover that compared with ERM, SAM, and flooding, the proposed Soft AD achieves far and away the smallest generalization error in terms of the base loss, while maintaining competitive accuracy and small model norms, without any explicit regularization. |
| Researcher Affiliation | Academia | Matthew J. Holland Osaka University Kosuke Nakatani Osaka University |
| Pseudocode | No | The paper provides mathematical formulations for the Flooding (equation 4) and Soft AD (equation 8) algorithms, and refers to them as "algorithms". However, it does not include a distinct section, figure, or block explicitly labeled as "Pseudocode" or "Algorithm X" with structured, step-by-step instructions like typical pseudocode. |
| Open Source Code | Yes | To re-create all of the numerical test results and figures from this paper, source code and Jupyter notebooks are available at a public Git Hub repository: https://github.com/feedbackward/bdd-flood. |
| Open Datasets | Yes | The datasets we use are all standard benchmarks in the machine learning community: CIFAR-10, CIFAR-100, Fashion MNIST, and SVHN. All of these datasets are collected using classes defined in the torchvision.datasets module, with raw training/test splits left as-is with default settings. |
| Dataset Splits | Yes | For each trial, we generate training and validation data of size 100, and test data of size 20000. All methods see the same data in each trial. ... in each trial we randomly select 80% of the raw training data to be used for actual training, with the remaining 20% used for validation. |
| Hardware Specification | Yes | Two units are equipped with an NVIDIA A100 (80GB), and the remaining machine uses an NVIDIA RTX 6000 Ada. |
| Software Dependencies | Yes | All of the experiments done in this section have been implemented using Py Torch 2 |
| Experiment Setup | Yes | All four methods of interest (ERM, Flooding, SAM, and Soft AD) are driven by the Adam optimizer, using a fixed learning rate of 0.001, with no momentum or weight decay. All methods use the multi-class logistic loss as their base loss (i.e., nn.Cross Entropy Loss in Py Torch), and are run for 500 epochs. We use mini-batch size of 50 here... |