Symmetries, Flat Minima, and the Conserved Quantities of Gradient Flow

Authors: Bo Zhao, Iordan Ganev, Robin Walters, Rose Yu, Nima Dehmamy

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present a set of experiments aimed at assessing the utility of the nonlinear group action and conserved quantities. A summary of the results are shown in Figure 2. We show that the value of conserved quantities can impact convergence rate and generalizability. We also find the nonlinear action to be viable for ensemble building to improve robustness under certain adversarial attacks.
Researcher Affiliation Collaboration Bo Zhao University of California, San Diego bozhao@ucsd.edu Iordan Ganev Radboud University iganev@cs.ru.nl Robin Walters Northeastern University r.walters@northeastern.edu Rose Yu University of California, San Diego roseyu@ucsd.edu Nima Dehmamy IBM Research nima.dehmamy@ibm.com
Pseudocode No The paper describes steps for an algorithm in paragraph form in Section 4 ('0. Input: weight matrices...', '1. Determine the spherical coordinates...') but does not present it as a structured pseudocode block or algorithm environment.
Open Source Code Yes Our code is available at https://github.com/Rose-STL-Lab/Gradient-Flow-Symmetry.
Open Datasets Yes We test the group action on CIFAR-10.
Dataset Splits No The paper mentions using CIFAR-10 but does not provide specific details on how the dataset was split into training, validation, or test sets.
Hardware Specification No No specific hardware details such as GPU/CPU models, memory, or cloud instance types are mentioned for running experiments.
Software Dependencies No No specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') are mentioned in the paper.
Experiment Setup Yes We repeat the gradient descent with learning rate 0.1, 0.01, and 0.001. The learning rate is set to 10 3... U and V are initialized with different variance... The model contains a convolution layer with kernel size 3, followed by a max pooling, a fully connected layer, a leaky Re LU activation, and another fully connected layer.