Derandomized novelty detection with FDR control via conformal e-values
Authors: Meshi Bashari, Amir Epstein, Yaniv Romano, Matteo Sesia
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Numerical experiments This section compares empirically the performance of Ada Detect and our proposed derandomized method described in Section 3.3, namely E-Ada Detect. Both procedures are deployed using a binary logistic regression classifier (Marandon et al., 2022) as the base predictive model. |
| Researcher Affiliation | Collaboration | Meshi Bashari Department of Electrical and Computer Engineering Technion IIT Haifa, Israel meshi.b@campus.technion.ac.il Amir Epstein Citi Innovation Lab Tel Aviv, Israel amir.epstein@citi.com Yaniv Romano Department of Electrical and Computer Engineering Department of Computer Science Technion IIT Haifa, Israel yromano@technion.ac.il Matteo Sesia Department of Data Sciences and Operations University of Southern California Los Angeles, California, USA sesia@marshall.usc.edu |
| Pseudocode | Yes | Having calculated aggregate e-values ej with the procedure described above, which is outlined by Algorithm S1 in the Supplementary Material, our method rejects the null hypothesis for all j Dtest whose ej is greater than an adaptive threshold calculated by applying the e BH filter of Wang and Ramdas (2022), which is outlined for completeness by Algorithm S2 in the Supplementary Material. |
| Open Source Code | Yes | Software implementing the algorithms described in this paper and enabling the reproduction of the associated numerical experiments is available at https://github.com/Meshiba/derandomized-novelty-detection. |
| Open Datasets | Yes | To further demonstrate the effectiveness of data-driven weighting, we turn to analyze the performance of E-Ada Detect on four real-world outlier detection data sets: musk, shuttle, KDDCup99, and credit card. We refer to Supplementary Section S6 for more information regarding these data. |
| Dataset Splits | Yes | The size of the reference set is n = 2000, with 1000 samples in the training subset and 1000 in the calibration subset. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., CPU, GPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using a 'binary logistic regression classifier' and 'random forests implemented with varying max-depth hyper-parameters' and 'support vector machines with an RBF kernel', but does not provide specific version numbers for these software libraries or their dependencies. |
| Experiment Setup | Yes | Specifically, we fit a sparse logistic regression model using K = 10 different values of the regularization parameter. To induce higher variability in the predictive rules, one model was trained with a regularization parameter equal to 0.0001, while the others were trained with regularization parameters equal to 1, 10, 50, and 100, respectively. ... Half of the models are random forests implemented with varying max-depth hyper-parameters (10, 12, 20, 30, and 7), while the other half are support vector machines with an RBF kernel with varying width hyper-parameters (0.1, 0.001, 0.5, 0.2, and 0.03). |