Stochastic Amortization: A Unified Approach to Accelerate Feature and Data Attribution

Authors: Ian Covert, Chanwoo Kim, Su-In Lee, James Y. Zou, Tatsunori B. Hashimoto

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate significant speedups for several of these tasks: we find that amortizing across an entire dataset with noisy labels is often more efficient than current per-example approximations, especially for large datasets, and that amortized feature and data attribution models generalize well to unseen examples. Experimentally, we test multiple estimators for Shapley value feature attributions and find that amortization works when the labels are unbiased (Section 5).
Researcher Affiliation Academia Ian Covert Stanford University icovert@stanford.edu Chanwoo Kim University of Washington chanwkim@uw.edu Su-In Lee University of Washington suinlee@uw.edu James Zou Stanford University jamesz@stanford.edu Tatsunori Hashimoto Stanford University thashim@stanford.edu
Pseudocode No The paper does not contain any blocks explicitly labeled as "Pseudocode" or "Algorithm".
Open Source Code Yes We provide two repositories to reproduce each our results: Feature attribution https://github.com/chanwkimlab/amortized-attribution Data valuation https://github.com/iancovert/amortized-valuation
Open Datasets Yes We used publicly available, open-source datasets for our experiments. For the feature attribution experiments, we used the Image Nette dataset [46, 23]... For the data valuation experiments, we used two tabular datasets from the UCI repository: the Mini Boo NE particle classification dataset [84] and the adult census income classification dataset [25]... For CIFAR-10, we used 50K examples for training and 1K for validation.
Dataset Splits Yes The validation set was used to perform early stopping, and the test set was only used to evaluate the model s performance on external data. ...For the data valuation experiments, we used variable numbers of training examples ranging from 250 to 10K, and we reserved 100 examples for validation in each case... For CIFAR-10, we used 50K examples for training and 1K for validation.
Hardware Specification Yes For the feature attribution experiments, we used a single Ge Force RTX 2080Ti GPU to train the amortized models. For the data valuation experiments, we used a single RTX A6000 to train amortized models, and these each trained in under an hour.
Software Dependencies No The paper mentions software components like "Adam W optimizer", "Adam", and "Open Data Val package" but does not specify their version numbers (e.g., PyTorch 1.9, Python 3.8).
Experiment Setup Yes For the feature attribution experiments, we optimized the models using the Adam W optimizer [67] with a linearly decaying learning rate schedule. The maximum learning rate was tuned using the validation loss, we trained for up to 100 epochs, and we selected the best model based on the validation loss. For the data valuation experiments, we optimized the models using Adam [55] with a cosine learning rate schedule. The maximum learning rate, the number of training epochs and the best model from the training run were determined using the validation loss.