Tighter Expected Generalization Error Bounds via Wasserstein Distance

Authors: Borja Rodríguez Gálvez, German Bassi, Ragnar Thobaben, Mikael Skoglund

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Theoretical Example 1 (Gaussian location model). Consider the problem of estimating the mean µ of a d-dimensional Gaussian distribution with known covariance matrix σ2Id. Further consider that there are n samples S = (Z1, . . . , Zn) available, the loss is measured with the Euclidean distance ℓ(w, z) = w z 2, and the estimation is their empirical mean W = 1 n Pn i=1 Zi. In this example, the expected generalization error can be calculated exactly (see Appendix E): gen(W, S) = dσ2/(2n). ... Figure 1: Expected generalization error and generalization error bounds for the Gaussian location model with N(µ, 1) (left) and N(µ, I250) (right). See Appendix E for the details.
Researcher Affiliation Collaboration Borja Rodríguez-Gálvez KTH Royal Institute of Technology Stockholm, Sweden borjarg@kth.se; Germán Bassi Ericsson Research Stockholm, Sweden german.bassi@ericsson.com; Ragnar Thobaben KTH Royal Institute of Technology Stockholm, Sweden ragnart@kth.se; Mikael Skoglund KTH Royal Institute of Technology Stockholm, Sweden skoglund@kth.se
Pseudocode No The paper describes mathematical proofs and outlines their steps but does not include any pseudocode or algorithm blocks with structured steps.
Open Source Code No The paper's checklist under “3. If you ran experiments...” explicitly states “[N/A]” for “Did you include the code, data, and instructions needed to reproduce the main experimental results (either in the supplemental material or as a URL)?”. No other statement about code release is found.
Open Datasets No The paper uses a theoretical “Gaussian location model” as an example for analytical calculations, which defines samples (Z_i) from a distribution (P_Z). It does not use or provide concrete access information (link, citation, repository) for a publicly available, named dataset.
Dataset Splits No The paper is theoretical and presents analytical calculations for a specific model; it does not involve empirical experiments with data splits for training, validation, or testing.
Hardware Specification No The paper is theoretical and involves analytical derivations and calculations rather than empirical experiments, thus no hardware specifications for running experiments are mentioned.
Software Dependencies No The paper is theoretical and does not describe software used for its analysis, thus no software dependencies with version numbers are provided.
Experiment Setup No The paper is theoretical and presents analytical derivations and comparisons. It does not describe an experimental setup with specific hyperparameters, training configurations, or system-level settings, as no empirical experiments were conducted.