Training Data Subset Selection for Regression with Controlled Generalization Error
Authors: Durga S, Rishabh Iyer, Ganesh Ramakrishnan, Abir De
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we present experimental results and analysis on several real-world datasets to evaluate the performance of SELCON against several competitive baselines. |
| Researcher Affiliation | Academia | 1CSE Department, Indian Institute of Technology, Bombay 2CS Department, University of Texas at Dallas. |
| Pseudocode | Yes | Algorithm 1 SELCON Algorithm |
| Open Source Code | Yes | Our code and data is available at https://github.com/abir-de/SELCON |
| Open Datasets | Yes | We experiment with five real world datasets, viz., Cadata (16718 instances), Law (20800 instances), NYSE-High (701348 instances), NYSE-Close (701348 instances), and Community-and-crime (1994 instances), all briefly described in Appendix D. ... Cadata (Pace & Barry, 1997): This dataset is available in scikit-learn. ... Law (Wightman, 1998)... NYSE (https://github.com/marefaand/stock_market_data): ... Community and Crime: ... is available at UCI ML repository. |
| Dataset Splits | Yes | In each experiment, we used (random) 89% training, 1% validation and 10% test folds. |
| Hardware Specification | No | The paper mentions 'GPUs, multicore processors, high storage disks' in a general context, but does not provide specific hardware details like exact GPU/CPU models or memory used for their experiments. |
| Software Dependencies | No | The paper mentions 'pytorch' but does not specify its version number, nor does it list other software dependencies with their versions. |
| Experiment Setup | Yes | Specifically, we set N = 2000 for Cadata and Law , N = 5000 for the NYSE datasets; and, b = min{|S|, 1000} across all datasets. Additionally, SELCON involves two more sets of small scale optimization problems (lines 3 and 8 respectively), where we set the number of epochs as 3. |