Symmetries, Flat Minima, and the Conserved Quantities of Gradient Flow
Authors: Bo Zhao, Iordan Ganev, Robin Walters, Rose Yu, Nima Dehmamy
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present a set of experiments aimed at assessing the utility of the nonlinear group action and conserved quantities. A summary of the results are shown in Figure 2. We show that the value of conserved quantities can impact convergence rate and generalizability. We also find the nonlinear action to be viable for ensemble building to improve robustness under certain adversarial attacks. |
| Researcher Affiliation | Collaboration | Bo Zhao University of California, San Diego bozhao@ucsd.edu Iordan Ganev Radboud University iganev@cs.ru.nl Robin Walters Northeastern University r.walters@northeastern.edu Rose Yu University of California, San Diego roseyu@ucsd.edu Nima Dehmamy IBM Research nima.dehmamy@ibm.com |
| Pseudocode | No | The paper describes steps for an algorithm in paragraph form in Section 4 ('0. Input: weight matrices...', '1. Determine the spherical coordinates...') but does not present it as a structured pseudocode block or algorithm environment. |
| Open Source Code | Yes | Our code is available at https://github.com/Rose-STL-Lab/Gradient-Flow-Symmetry. |
| Open Datasets | Yes | We test the group action on CIFAR-10. |
| Dataset Splits | No | The paper mentions using CIFAR-10 but does not provide specific details on how the dataset was split into training, validation, or test sets. |
| Hardware Specification | No | No specific hardware details such as GPU/CPU models, memory, or cloud instance types are mentioned for running experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') are mentioned in the paper. |
| Experiment Setup | Yes | We repeat the gradient descent with learning rate 0.1, 0.01, and 0.001. The learning rate is set to 10 3... U and V are initialized with different variance... The model contains a convolution layer with kernel size 3, followed by a max pooling, a fully connected layer, a leaky Re LU activation, and another fully connected layer. |