Interpreting Robust Optimization via Adversarial Influence Functions

Authors: Zhun Deng, Cynthia Dwork, Jialiang Wang, Linjun Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Figure 1, we can see that as ε become smaller, ˆI(n, ε) and ˆS(n, ε) gradually go to 0. We remark here that we do not let ε be exactly 0 in our experiments, since PGD cannot obtain the exact optimal solutions for ˆθM min and ˆθM ε,min. The model we use is a linear regression model with 500 inputs drawn from a two-dimensional standard Gaussian, i.e. x N(0, I). We fit y with y = 2x1 3.4x2 + η and η 0.1 N(0, I). In the experiments in Figure 2(a), we show the trend for ˆSε(GL) by taking sample size n = 5000, σξ = 0.1. We take the average result for 1000 repetitions. In Figure 2(b), we experimentally demonstrate the effectiveness of the approximation of AIF in kernel regressions with neural tangent kernel on MNIST. The estimation is based on the average of randomly drawn 300 examples from MNIST for 10 times.
Researcher Affiliation Academia 1John A. Paulson School of Engineering and Applied Sciences, Harvard University 2Department of Statistics, Rutgers University.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any concrete access to source code for the methodology described.
Open Datasets Yes In Figure 2(b), we experimentally demonstrate the effectiveness of the approximation of AIF in kernel regressions with neural tangent kernel on MNIST. The estimation is based on the average of randomly drawn 300 examples from MNIST for 10 times.
Dataset Splits No We have training dataset (Xt, Y t) = {(xt 1, yt 1), , (xt nt, yt nt)} and evaluation dataset (Xe, Y e) = {(xe 1, ye 1), , (xe ne, ye ne)}. The paper mentions the existence of training and evaluation datasets but does not provide specific split percentages, sample counts, or methodology for creating these splits from a larger dataset.
Hardware Specification No The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies No The paper mentions using projected gradient descent (PGD) for robust optimization but does not specify any programming languages, libraries, frameworks, or their version numbers that were used for implementation.
Experiment Setup Yes The model we use is a linear regression model with 500 inputs drawn from a two-dimensional standard Gaussian, i.e. x N(0, I). We fit y with y = 2x1 3.4x2 + η and η 0.1 N(0, I). In the experiments in Figure 2(a), we show the trend for ˆSε(GL) by taking sample size n = 5000, σξ = 0.1. We train the neural network by randomly initialized gradient descent on the quadratic loss over data S. In particular, we initialize the parameters randomly: wr N(0, κ2I), ar U( 1, 1), for all r [m], then Jacot et al. [2018] showed that, such a resulting network converges to the solution produced by the kernel regression with the so called Neural Tangent Kernel (NTK) matrix: