Distributionally Robust Optimization and Generalization in Kernel Methods

Authors: Matthew Staib, Stefanie Jegelka

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 6 Experiments In subsection 4.3 we proposed an alternate regularizer for kernel ridge regression, specifically, penalizing f 2 σ/ 2 instead of f 2 σ. Here we probe the new regularizer on a synthetic problem where we can precisely compute the population risk RP(f). Consider the Gaussian kernel kσ with σ = 1. Fix the ground truth h = kσ(1, ) kσ( 1, ) Hσ. Sample 104 points from a standard one dimensional Gaussian, and set this as the population P. Then subsample n points xi = h(xi) + ϵi, where ϵi are Gaussian. We consider both an easy regime, where n = 103 and Var(ϵi) = 10 2, and a hard regime where n = 102 and Var(ϵi) = 1. On the empirical data, we fit f Hσ by minimizing square loss plus either λ f 2 σ (as is typical) or λ f 2 σ/ 2 (our proposal). We average over 102 resampling trials for the easy case and 103 for the hard case, and report 95% confidence intervals. Figure 1 shows the result in each case for a parameter sweep over λ.
Researcher Affiliation Academia Matthew Staib MIT CSAIL mstaib@mit.edu Stefanie Jegelka MIT CSAIL stefje@csail.mit.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement about releasing source code or a link to a code repository for the methodology described.
Open Datasets No The paper uses synthetic data generated by the authors, stating: 'Sample 104 points from a standard one dimensional Gaussian, and set this as the population P. Then subsample n points xi = h(xi) + ϵi, where ϵi are Gaussian.' There is no concrete access information (link, DOI, formal citation) to a publicly available dataset.
Dataset Splits No The paper describes using a synthetic dataset and sampling points, but does not provide specific details on dataset splits (e.g., percentages for train/validation/test, specific sample counts for each, or cross-validation folds) needed to reproduce data partitioning.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments.
Software Dependencies No The paper does not provide specific software dependencies (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup No The paper mentions a 'parameter sweep over λ' but does not provide specific experimental setup details such as concrete hyperparameter values (e.g., learning rate, batch size, full range of λ), training configurations, or optimizer settings.