Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Feature Shift Detection: Localizing Which Features Have Shifted via Conditional Distribution Tests
Authors: Sean Kulinski, Saurabh Bagchi, David I. Inouye
NeurIPS 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We additionally develop methods for identifying when and where a shift occurs in multivariate time-series data and show results for multiple scenarios using realistic attack models on both simulated and real world data. 1...3 Experiments...3.1 Simulated Experiments...3.2 Experiments on Real-World Data |
| Researcher Affiliation | Academia | Sean M. Kulinski Saurabh Bagchi David I. Inouye School of Electrical and Computer Engineering Purdue University EMAIL |
| Pseudocode | No | The paper does not contain any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | Yes | 1The code for our experiments and methods is at https://github.com/SeanKski/feature-shift. |
| Open Datasets | Yes | We present results on the UCI Appliance Energy Prediction dataset [4], UCI Gas Sensors for Home Activity Monitoring [10], and the number of new deaths from COVID-19 for the 10 states with the highest total deaths as of September 2020, measured by the CDC [1]. |
| Dataset Splits | No | The paper describes using 'bootstrap sampling to approximate the sampling distribution of the test statistic' and 'Time-Boot subsamples random contiguous chunks from clean held out data' for generating samples for statistical testing. However, it does not explicitly provide traditional dataset splits (e.g., 80/10/10%) for training, validation, and testing a model. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments. It only implies that computations were performed. |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn 0.x.x) that would be needed to reproduce the experiments. |
| Experiment Setup | Yes | Method Details. ...For the expectation over x j in Def. 3, we use 30 samples from both X j and Y j to empirically approximate this expectation. For all methods, we use bootstrap sampling to approximate the sampling distribution of the test statistic γ for each of the methods above. In particular, we bootstrap B two-sample datasets ... We set the target significance level to = 0.05 as in [25] (note: this is for the detection stage only; the significance level for the localization stage is not explicitly set). |