Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Generalization as Dynamical Robustness--The Role of Riemannian Contraction in Supervised Learning

Authors: Leo Kozachkov, Patrick Wensing, Jean-Jacques Slotine

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical In particular, we prove that Riemannian contraction in a supervised learning setting implies generalization. We show that if a learning algorithm is contracting in some Riemannian metric with rate λ > 0, it is uniformly algorithmically stable with rate O(1/λn), where n is the number of examples in the training set. The results hold for stochastic and deterministic optimization, in both continuous and discrete-time, for convex and non-convex loss surfaces. The associated generalization bounds reduce to well-known results in the particular case of gradient descent over convex or strongly convex loss surfaces.
Researcher Affiliation Academia Leo Kozachkov EMAIL Department of Brain and Cognitive Sciences K. Lisa Yang Integrative Computational Neuroscience (ICo N) Center Massachusetts Institute of Technology; Patrick M. Wensing EMAIL Department of Aerospace and Mechanical Engineering University of Notre Dame; Jean-Jacques Slotine EMAIL Department of Mechanical Engineering Department of Brain and Cognitive Sciences Massachusetts Institute of Technology
Pseudocode No The paper describes mathematical equations and derivations for dynamical systems and optimizers but does not include any explicitly labeled pseudocode or algorithm blocks. For example, equations like (16) describe an iterative optimizer but are not presented as structured algorithms.
Open Source Code No The paper mentions using "an implementation of the Bartels Stewart algorithm in Sci Py (Bartels & Stewart, 1972; Virtanen et al., 2020)" and "Cholesky decomposition (also using Sci Py)" but does not state that the authors' own code for their methodology is open-source or provide a link to a repository.
Open Datasets No The paper uses mathematical constructs and theoretical examples, such as the 'Rosenbrock function', rather than specific, named public datasets for empirical evaluation. No datasets are provided with access information, links, or formal citations.
Dataset Splits No Since the paper focuses on theoretical analysis and does not conduct experiments on specific datasets, there is no mention of training, test, or validation dataset splits.
Hardware Specification No The paper is a theoretical work focusing on mathematical analysis and derivations, and thus does not include any details about hardware specifications used for running experiments.
Software Dependencies Yes We then calculate the ratio χ/λM (the only metric-dependent terms in our algorithmic stability bound). We repeat this procedure 4000 times. Left subplot) The ratio χ/λM plotted against χ. Right subplot) The ratio χ/λM bound plotted against λM. Details are in the main text. These plots illustrate our theoretical result that the identity metric gives the optimal (i.e smallest) algorithmic stability bound. They also illustrate the reason: the identity metric simultaneously obtains the smallest condition number and the largest contraction rate, thus minimizing the ratio of the former to the latter. This result is illustrated in Figure 4. To create this plot we generated a random G R3 3. Then we generated random Q and solved the Lyapunov equation for M using an implementation of the Bartels Stewart algorithm in Sci Py (Bartels & Stewart, 1972; Virtanen et al., 2020).
Experiment Setup No The paper focuses on theoretical analysis and mathematical proofs rather than empirical experiments. While it discusses concepts like learning rates (e.g., "learning rate scheduler ρ(θ, t)") and initial conditions (e.g., "initialized at the origin"), these are theoretical assumptions for analysis, not specific hyperparameters or system-level settings for a concrete experimental setup.