Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests

Authors: Sean Kulinski, Saurabh Bagchi, David I. Inouye

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data. 1...3 Experiments...3.1 Simulated Experiments...3.2 Experiments on Real-World Data
Researcher Affiliation Academia Sean M. Kulinski Saurabh Bagchi David I. Inouye School of Electrical and Computer Engineering Purdue University {skulinsk,sbagchi,dinouye}@purdue.edu
Pseudocode No The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes 1The code for our experiments and methods is at https://github.com/SeanKski/feature-shift.
Open Datasets Yes We present results on the UCI Appliance Energy Prediction dataset [4], UCI Gas Sensors for Home Activity Monitoring [10], and the number of new deaths from COVID-19 for the 10 states with the highest total deaths as of September 2020, measured by the CDC [1].
Dataset Splits No The paper describes using 'bootstrap sampling to approximate the sampling distribution of the test statistic' and 'Time-Boot subsamples random contiguous chunks from clean held out data' for generating samples for statistical testing. However, it does not explicitly provide traditional dataset splits (e.g., 80/10/10%) for training, validation, and testing a model.
Hardware Specification No The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only implies that computations were performed.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn 0.x.x) that would be needed to reproduce the experiments.
Experiment Setup Yes Method Details. ...For the expectation over x j in Def. 3, we use 30 samples from both X j and Y j to empirically approximate this expectation. For all methods, we use bootstrap sampling to approximate the sampling distribution of the test statistic γ for each of the methods above. In particular, we bootstrap B two-sample datasets ... We set the target significance level to = 0.05 as in [25] (note: this is for the detection stage only; the significance level for the localization stage is not explicitly set).