reproducibilityindex.ai

On the Accuracy of Influence Functions for Measuring Group Effects

Authors: Pang Wei W. Koh, Kai-Siang Ang, Hubert Teo, Percy S. Liang

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we ﬁnd that across many different types of groups and for a range of real-world datasets, the predicted effect (using inﬂuence functions) of a group correlates surprisingly well with its actual effect, even if the absolute and relative errors are large. Our theoretical analysis shows that such strong correlation arises only under certain settings and need not hold in general, indicating that real-world datasets have particular properties that allow the inﬂuence approximation to be accurate.
Researcher Affiliation	Academia	Pang Wei Koh Kai-Siang Ang Hubert H. K. Teo Percy Liang Department of Computer Science Stanford University {pangwei@cs, kaiang@, hteo@, pliang@cs}.stanford.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	The code for replicating our experiments is available in the Git Hub repository https: //github.com/kohpangwei/group-influence-release.
Open Datasets	Yes	Here, we report results on 5 datasets chosen to span a range of applications, training set size n, and number of features d (Table 1). The first 4 datasets involve hospital readmission prediction, spam classiﬁcation, and object recognition, and were used in Koh and Liang (2017) to study the inﬂuence of individual points. The ﬁfth dataset is a chemical-disease relationship (CDR) dataset Hancock et al. (2018). In Section 5, we will also study the Multi NLI language inference dataset (Williams et al., 2018).
Dataset Splits	No	The paper mentions "regularization λ selected by cross-validation" and uses terms like "training points" and "training data" but does not provide explicit training, validation, or test set splits (e.g., percentages, sample counts, or specific predefined splits with citations).
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, or memory specifications).
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies, such as libraries, frameworks, or programming languages used.
Experiment Setup	Yes	On each dataset, we trained an L2-regularized logistic regression model (or softmax for the multiclass tasks) and compared the inﬂuences and actual effects of these subsets. ... with regularization λ selected by cross-validation. ... Overall, for each dataset, we constructed 1,700 subsets ranging in size from 0.25% to 25% of the training points.