On the Accuracy of Influence Functions for Measuring Group Effects
Authors: Pang Wei W. Koh, Kai-Siang Ang, Hubert Teo, Percy S. Liang
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we find that across many different types of groups and for a range of real-world datasets, the predicted effect (using influence functions) of a group correlates surprisingly well with its actual effect, even if the absolute and relative errors are large. Our theoretical analysis shows that such strong correlation arises only under certain settings and need not hold in general, indicating that real-world datasets have particular properties that allow the influence approximation to be accurate. |
| Researcher Affiliation | Academia | Pang Wei Koh Kai-Siang Ang Hubert H. K. Teo Percy Liang Department of Computer Science Stanford University {pangwei@cs, kaiang@, hteo@, pliang@cs}.stanford.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | The code for replicating our experiments is available in the Git Hub repository https: //github.com/kohpangwei/group-influence-release. |
| Open Datasets | Yes | Here, we report results on 5 datasets chosen to span a range of applications, training set size n, and number of features d (Table 1). The first 4 datasets involve hospital readmission prediction, spam classification, and object recognition, and were used in Koh and Liang (2017) to study the influence of individual points. The fifth dataset is a chemical-disease relationship (CDR) dataset Hancock et al. (2018). In Section 5, we will also study the Multi NLI language inference dataset (Williams et al., 2018). |
| Dataset Splits | No | The paper mentions "regularization λ selected by cross-validation" and uses terms like "training points" and "training data" but does not provide explicit training, validation, or test set splits (e.g., percentages, sample counts, or specific predefined splits with citations). |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU, GPU models, or memory specifications). |
| Software Dependencies | No | The paper does not provide specific version numbers for any software dependencies, such as libraries, frameworks, or programming languages used. |
| Experiment Setup | Yes | On each dataset, we trained an L2-regularized logistic regression model (or softmax for the multiclass tasks) and compared the influences and actual effects of these subsets. ... with regularization λ selected by cross-validation. ... Overall, for each dataset, we constructed 1,700 subsets ranging in size from 0.25% to 25% of the training points. |