reproducibilityindex.ai

Interpreting Robust Optimization via Adversarial Influence Functions

Authors: Zhun Deng, Cynthia Dwork, Jialiang Wang, Linjun Zhang

ICML 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In Figure 1, we can see that as ε become smaller, ˆI(n, ε) and ˆS(n, ε) gradually go to 0. We remark here that we do not let ε be exactly 0 in our experiments, since PGD cannot obtain the exact optimal solutions for ˆθM min and ˆθM ε,min. The model we use is a linear regression model with 500 inputs drawn from a two-dimensional standard Gaussian, i.e. x N(0, I). We fit y with y = 2x1 3.4x2 + η and η 0.1 N(0, I). In the experiments in Figure 2(a), we show the trend for ˆSε(GL) by taking sample size n = 5000, σξ = 0.1. We take the average result for 1000 repetitions. In Figure 2(b), we experimentally demonstrate the effectiveness of the approximation of AIF in kernel regressions with neural tangent kernel on MNIST. The estimation is based on the average of randomly drawn 300 examples from MNIST for 10 times.
Researcher Affiliation	Academia	1John A. Paulson School of Engineering and Applied Sciences, Harvard University 2Department of Statistics, Rutgers University.
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide any concrete access to source code for the methodology described.
Open Datasets	Yes	In Figure 2(b), we experimentally demonstrate the effectiveness of the approximation of AIF in kernel regressions with neural tangent kernel on MNIST. The estimation is based on the average of randomly drawn 300 examples from MNIST for 10 times.
Dataset Splits	No	We have training dataset (Xt, Y t) = {(xt 1, yt 1), , (xt nt, yt nt)} and evaluation dataset (Xe, Y e) = {(xe 1, ye 1), , (xe ne, ye ne)}. The paper mentions the existence of training and evaluation datasets but does not provide specific split percentages, sample counts, or methodology for creating these splits from a larger dataset.
Hardware Specification	No	The paper does not provide any specific hardware details (e.g., GPU/CPU models, memory amounts) used for running its experiments.
Software Dependencies	No	The paper mentions using projected gradient descent (PGD) for robust optimization but does not specify any programming languages, libraries, frameworks, or their version numbers that were used for implementation.
Experiment Setup	Yes	The model we use is a linear regression model with 500 inputs drawn from a two-dimensional standard Gaussian, i.e. x N(0, I). We ﬁt y with y = 2x1 3.4x2 + η and η 0.1 N(0, I). In the experiments in Figure 2(a), we show the trend for ˆSε(GL) by taking sample size n = 5000, σξ = 0.1. We train the neural network by randomly initialized gradient descent on the quadratic loss over data S. In particular, we initialize the parameters randomly: wr N(0, κ2I), ar U( 1, 1), for all r [m], then Jacot et al. [2018] showed that, such a resulting network converges to the solution produced by the kernel regression with the so called Neural Tangent Kernel (NTK) matrix: