reproducibilityindex.ai

Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests

Authors: Sean Kulinski, Saurabh Bagchi, David I. Inouye

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data. 1...3 Experiments...3.1 Simulated Experiments...3.2 Experiments on Real-World Data
Researcher Affiliation	Academia	Sean M. Kulinski Saurabh Bagchi David I. Inouye School of Electrical and Computer Engineering Purdue University {skulinsk,sbagchi,dinouye}@purdue.edu
Pseudocode	No	The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	Yes	1The code for our experiments and methods is at https://github.com/SeanKski/feature-shift.
Open Datasets	Yes	We present results on the UCI Appliance Energy Prediction dataset [4], UCI Gas Sensors for Home Activity Monitoring [10], and the number of new deaths from COVID-19 for the 10 states with the highest total deaths as of September 2020, measured by the CDC [1].
Dataset Splits	No	The paper describes using 'bootstrap sampling to approximate the sampling distribution of the test statistic' and 'Time-Boot subsamples random contiguous chunks from clean held out data' for generating samples for statistical testing. However, it does not explicitly provide traditional dataset splits (e.g., 80/10/10%) for training, validation, and testing a model.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only implies that computations were performed.
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn 0.x.x) that would be needed to reproduce the experiments.
Experiment Setup	Yes	Method Details. ...For the expectation over x j in Def. 3, we use 30 samples from both X j and Y j to empirically approximate this expectation. For all methods, we use bootstrap sampling to approximate the sampling distribution of the test statistic γ for each of the methods above. In particular, we bootstrap B two-sample datasets ... We set the target signiﬁcance level to = 0.05 as in [25] (note: this is for the detection stage only; the signiﬁcance level for the localization stage is not explicitly set).