Feature selection using e-values
Authors: Subhabrata Majumdar, Snigdhansu Chatterjee
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We implement e-values using a GBS with scaled resample weights Wri Gamma(1, 1) 1, and resample sizes R = R1 = 1000. We use Mahalanobis depth for all depth calculations. Mahalanobis depth is much less computation-intensive than other depth functions (Dyckerhoff & Mozharovskyi, 2016; Liu & Zuo, 2014), but is not usually preferred in applications due to its non-robustness. However, we do not use any robustness properties of data depth, so are able to use it without any concern. For each replication for each data setting and method, we compute performance metrics on test datasets of the same dimensions as the respective training dataset. All our results are based on 1000 such replications. |
| Researcher Affiliation | Collaboration | 1School of Statistics, University of Minnesota Twin Cities, Minneapolis, MN, USA 2Currently at Splunk. Correspondence to: Subhabrata Majumdar <smajumdar@splunk.com>. |
| Pseudocode | Yes | Algorithm 1 Best subset selection using e-values |
| Open Source Code | Yes | Code and data for the experiments in this paper are available at https://github.com/shubhobm/e-values. |
| Open Datasets | Yes | Indian monsoon data... obtain data on 35 potential covariates (see Appendix D) from National Climatic Data Center (NCDC) and National Oceanic and Atmospheric Administration (NOAA) repositories for 1978 2012. |
| Dataset Splits | Yes | We train our model on data from the years 1978-2002, run e-values best subset selection for tuning parameters τn {0.05, 0.1, . . . , 1}. We consider two methods to select the best refitted model: (a) minimizing GBIC(τn), and (b) minimizing forecasting errors on samples from 2003 2012. |
| Hardware Specification | Yes | All computations were performed on a Windows desktop with an 8-core Intel Core-i7 6700K 4GHz CPU and 16GB RAM. |
| Software Dependencies | No | The paper mentions statistical methods and distributions like 'GBS with scaled resample weights Wri Gamma(1, 1) 1' and 'Mahalanobis depth', but it does not provide specific software names with version numbers (e.g., Python 3.x, PyTorch 1.x, scikit-learn x.x.x). |
| Experiment Setup | Yes | We implement e-values using a GBS with scaled resample weights Wri Gamma(1, 1) 1, and resample sizes R = R1 = 1000. |