reproducibilityindex.ai

Adversarial Learning for Feature Shift Detection and Correction

Authors: Míriam Barrabés, Daniel Mas Montserrat, Margarita Geleta, Xavier Giró-i-Nieto, Alexander Ioannidis

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We provide an in-depth experimental evaluation with multiple manipulation types and datasets. (Introduction, Contributions). 6 Experimental Results
Researcher Affiliation	Collaboration	Míriam Barrabés 1,2, Daniel Mas Montserrat 1, Margarita Geleta 3 Xavier Giró-i-Nieto 4, Alexander G. Ioannidis 1 1Stanford University 2Universitat Politècnica de Catalunya 3University of California, Berkeley 4Amazon
Pseudocode	Yes	Algorithm 1 DF-Locate (Page 4), Algorithm 2 DF-Correct (Page 5)
Open Source Code	Yes	The code is available at https://github.com/AI-sandbox/Data Fix. (Abstract)
Open Datasets	Yes	We use multiple datasets including UCI datasets such as Gas [76], Energy [77], and Musk2 [78], Open ML datasets including Scene [79], MNIST [80], and Dilbert [81], datasets with DNA sequences such as Founders [82] and a private Dog DNA dataset (Canine), a subset of phenotypes from the UK Biobank (Phenotypes) [83], Covid-19 data [84], and simple datasets including values generated from Cosine and Polynomial functions. (Section 6, Real world datasets)
Dataset Splits	Yes	We use 5-fold train-evaluation such that 80% of the samples are used to train a random forest binary classifier Dθ(x), and the 20% left is used to estimate the empirical divergence with Nx and Ny testing samples of the reference and query datasets, respectively. (Section 4, Shift Detection). We use the simulated datasets to perform hyperparameter search for Data Fix and all competing methods, while the real datasets are used as a hold-out testing set (Section 6, Experimental Details).
Hardware Specification	Yes	All experiments were done with an Intel Xeon Gold with 12 CPU cores. (Section H, Computational Time)
Software Dependencies	No	The paper mentions software components like 'random forest', 'gradient boosting trees', and 'Cat Boost [75]' but does not provide specific version numbers for these or other dependencies required for reproduction.
Experiment Setup	Yes	We use 5-fold train-evaluation such that 80% of the samples are used to train a random forest binary classifier Dθ(x)... (Section 4, Shift Detection). τ is a hyperparameter set by hyperparameter optimization... We use ϵ = 0.02 as the stopping threshold. (Section 4, DF-Locate). If the initial empirical divergence is already lower than ε = 0.1, the correction process is finalized. (Section 5, Initial Imputation). Typically, the number of epochs is set to 1 or 2 (Section 5, Iterative Process). Table 4: Search space and optimal values for tuned parameters in detection benchmarking methods. Table 5: Search space and optimal values for tuned parameters in correction benchmarking methods. (Section G.1, Hyperparameter Search).