Interpreting Robust Optimization via Adversarial Influence Functions
Authors: Zhun Deng, Cynthia Dwork, Jialiang Wang, Linjun Zhang
ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Figure 1, we can see that as ε become smaller, ˆI(n, ε) and ˆS(n, ε) gradually go to 0. We remark here that we do not let ε be exactly 0 in our experiments, since PGD cannot obtain the exact optimal solutions for ˆθM min and ˆθM ε,min. The model we use is a linear regression model with 500 inputs drawn from a two-dimensional standard Gaussian, i.e. x N(0, I). We fit y with y = 2x1 3.4x2 + η and η 0.1 N(0, I). In the experiments in Figure 2(a), we show the trend for ˆSε(GL) by taking sample size n = 5000, σξ = 0.1. We take the average result for 1000 repetitions. In Figure 2(b), we experimentally demonstrate the effectiveness of the approximation of AIF in kernel regressions with neural tangent kernel on MNIST. The estimation is based on the average of randomly drawn 300 examples from MNIST for 10 times. |
| Researcher Affiliation | Academia | 1John A. Paulson School of Engineering and Applied Sciences, Harvard University 2Department of Statistics, Rutgers University. |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any concrete access to source code for the methodology described. |
| Open Datasets | Yes | In Figure 2(b), we experimentally demonstrate the effectiveness of the approximation of AIF in kernel regressions with neural tangent kernel on MNIST. The estimation is based on the average of randomly drawn 300 examples from MNIST for 10 times. |
| Dataset Splits | No | We have training dataset (Xt, Y t) = {(xt 1, yt 1), , (xt nt, yt nt)} and evaluation dataset (Xe, Y e) = {(xe 1, ye 1), , (xe ne, ye ne)}. The paper mentions the existence of training and evaluation datasets but does not provide specific split percentages, sample counts, or methodology for creating these splits from a larger dataset. |
| Hardware Specification | No | The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments. |
| Software Dependencies | No | The paper mentions using projected gradient descent (PGD) for robust optimization but does not specify any programming languages, libraries, frameworks, or their version numbers that were used for implementation. |
| Experiment Setup | Yes | The model we use is a linear regression model with 500 inputs drawn from a two-dimensional standard Gaussian, i.e. x N(0, I). We fit y with y = 2x1 3.4x2 + η and η 0.1 N(0, I). In the experiments in Figure 2(a), we show the trend for ˆSε(GL) by taking sample size n = 5000, σξ = 0.1. We train the neural network by randomly initialized gradient descent on the quadratic loss over data S. In particular, we initialize the parameters randomly: wr N(0, κ2I), ar U( 1, 1), for all r [m], then Jacot et al. [2018] showed that, such a resulting network converges to the solution produced by the kernel regression with the so called Neural Tangent Kernel (NTK) matrix: |