Keep the Momentum: Conservation Laws beyond Euclidean Gradient Flows
Authors: Sibylle Marcotte, Rémi Gribonval, Gabriel Peyré
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We consider here a 3-layer MLP trained for classification on the MNIST dataset (Le Cun et al., 2010) with the cross entropy loss function and a ReLU non-linearity... Figure 1, left, shows the evolution of the loss for a range of step size δ up to almost no convergence... Figure 2 shows how the evolution of the loss and the preserved quantities for GF is impacted by the momentum parameter µ = 1/τ. |
| Researcher Affiliation | Academia | Sibylle Marcotte 1 Rémi Gribonval 2 Gabriel Peyré 1 3 1ENS PSL Univ. 2Univ Lyon, Ens L, UCBL, CNRS, Inria, LIP. 3CNRS. Correspondence to: Sibylle Marcotte <sibylle.marcotte@ens.fr>. |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Our code to compute them is available at https://github.com/sibyllema/Conservation_laws_ICML. |
| Open Datasets | Yes | We consider here a 3-layer MLP trained for classification on the MNIST dataset (Le Cun et al., 2010). |
| Dataset Splits | No | The paper mentions using training and test sets from MNIST but does not specify a validation set or explicit split percentages for training, validation, and testing. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments, such as specific GPU or CPU models. |
| Software Dependencies | No | We used the software Sage Math (The Sage Developers, 2022), which relies on a Python interface. |
| Experiment Setup | Yes | We consider the following time discretization of the flows, where time at step k is t = kδ and δ > 0 is the time step... This can be re-written in the usual form of a gradient descent with momentum θk+1 = θk αMk EZ(θk) + β(θk θk 1) where α := δ ν + µ/δ and β := µ δν + µ < 1. Here β [0, 1) is the momentum (extrapolation) parameter, so that β = 0 corresponds to usual gradient descent, and setting β = 1 is maximum momentum (which is not in general ensured to converge). Figure 1, left, shows the evolution of the loss for a range of step size δ... Figure 2 shows how the evolution of the loss and the preserved quantities for GF is impacted by the momentum parameter µ = 1/τ. |