Simple Weak Coresets for Non-decomposable Classification Measures
Authors: Jayesh Malaviya, Anirban Dasgupta, Rachit Chhaya
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We provide experimental evidence of our results on real datasets and for various classifiers and sampling techniques. Figures 1 through 4 clearly show that uniform sampling gives superior or comparable performance to other sophisticated methods for both F1 score and MCC. |
| Researcher Affiliation | Academia | 1Indian Institute of Technology, Gandhinagar 2DA-IICT, Gandhinagar |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide a direct link to the source code for the described methodology or explicitly state its release. |
| Open Datasets | Yes | Data Sets: The COVERTYPE (Blackard 1998) data consists of 581, 012 cartographic observations of different forests with 54 features. The task is to predict the type of trees at each location (49% positive). The KDDCUP 99 (Stolfo et al. 1999) data comprises of 494,021 network connections with 41 features, and the task is to detect network intrusions (20% positive). The Adult (Becker and Kohavi 1996) dataset is a widely-used dataset containing information about individuals from the 1994 U.S. Census Bureau database. |
| Dataset Splits | No | The paper specifies training data usage but lacks explicit details on standard train/validation/test splits, including specific percentages or sample counts for each subset to ensure reproducibility of data partitioning. |
| Hardware Specification | Yes | All experiments were run on a computer with Nvidia Tesla V100 GPU with 32 GB memory and 28 CPUs. |
| Software Dependencies | No | The paper mentions Python and scikit-learn but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | For MLP experiments, we considered a simple MLP classifier with two hidden layers of size 100 each and the final output layer of size two, as we are dealing with binary classification. The optimizer used for the MLP is Adam, and the activation function used is Re LU. |