Derandomized novelty detection with FDR control via conformal e-values

Authors: Meshi Bashari, Amir Epstein, Yaniv Romano, Matteo Sesia

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Numerical experiments This section compares empirically the performance of Ada Detect and our proposed derandomized method described in Section 3.3, namely E-Ada Detect. Both procedures are deployed using a binary logistic regression classifier (Marandon et al., 2022) as the base predictive model.
Researcher Affiliation Collaboration Meshi Bashari Department of Electrical and Computer Engineering Technion IIT Haifa, Israel meshi.b@campus.technion.ac.il Amir Epstein Citi Innovation Lab Tel Aviv, Israel amir.epstein@citi.com Yaniv Romano Department of Electrical and Computer Engineering Department of Computer Science Technion IIT Haifa, Israel yromano@technion.ac.il Matteo Sesia Department of Data Sciences and Operations University of Southern California Los Angeles, California, USA sesia@marshall.usc.edu
Pseudocode Yes Having calculated aggregate e-values ej with the procedure described above, which is outlined by Algorithm S1 in the Supplementary Material, our method rejects the null hypothesis for all j Dtest whose ej is greater than an adaptive threshold calculated by applying the e BH filter of Wang and Ramdas (2022), which is outlined for completeness by Algorithm S2 in the Supplementary Material.
Open Source Code Yes Software implementing the algorithms described in this paper and enabling the reproduction of the associated numerical experiments is available at https://github.com/Meshiba/derandomized-novelty-detection.
Open Datasets Yes To further demonstrate the effectiveness of data-driven weighting, we turn to analyze the performance of E-Ada Detect on four real-world outlier detection data sets: musk, shuttle, KDDCup99, and credit card. We refer to Supplementary Section S6 for more information regarding these data.
Dataset Splits Yes The size of the reference set is n = 2000, with 1000 samples in the training subset and 1000 in the calibration subset.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using a 'binary logistic regression classifier' and 'random forests implemented with varying max-depth hyper-parameters' and 'support vector machines with an RBF kernel', but does not provide specific version numbers for these software libraries or their dependencies.
Experiment Setup Yes Specifically, we fit a sparse logistic regression model using K = 10 different values of the regularization parameter. To induce higher variability in the predictive rules, one model was trained with a regularization parameter equal to 0.0001, while the others were trained with regularization parameters equal to 1, 10, 50, and 100, respectively. ... Half of the models are random forests implemented with varying max-depth hyper-parameters (10, 12, 20, 30, and 7), while the other half are support vector machines with an RBF kernel with varying width hyper-parameters (0.1, 0.001, 0.5, 0.2, and 0.03).